Oracle Enterprise Manager 12c Cloud Control – Incident Manager – Part Three

Monitoring target systems is a critical responsibility of today’s database and system administrators. DBA’s and SA’s are responsible to ensure our systems are available, that their performance is within acceptable parameters, and watch out for any kind of error conditions and/or unusual behavior patterns.

In this series of articles, we’ve been looking into the functionality behind the new Incident Management features introduced in Oracle EM 12c Cloud Control. We now continue with a look at the main functions of Incident Manager itself and gain an understanding of both rules and rule sets, which are a critical part of controlling what will get automatically created into incidents to be tracked by the Incident Manger feature.

In my previous articles the focus was mostly on understanding the underlying events that are used as the building blocks behind Incidents that can in turn be managed and monitored using the new Incident Management feature of Oracle Enterprise Manager 12c Cloud Control.

In this article we will take a look at the main functions of Incident Manager and gain an understanding of both rules and rule sets, which are a critical part of controlling what will get automatically created into incidents to be tracked by the Incident Manger feature.

Incident Manager

This is the heart of the incident management functionality that Oracle introduced in EM 12 Cloud Control. From this interface a DBA can take care of all aspects of monitoring, tracking and investigating incidents, events and problems in their databases.

To access Incident Manager, from the Enterprise menu on the EM home page, select Monitoring and then Incident Manager. At the top right hand side of the page will be a list of any incidents that have happened. To see more detail, simply select the incident and the bottom right will have detailed information about the incident.

There are five tabs of information, General, Events, My Oracle Support Knowledge, Updates and Related Events and Incidents. From the General tab there is a section for tracking the incident as well as potentially a section to help with a “Guided Resolution” where you can drill into areas such as performance findings, metric details, recent changes etc.

On the left hand side is a navigator style section that you can use to look at different views of the incidents, which include My Open Incidents (those assigned to you), Unassigned Incidents, Unacknowledged Incidents, All Open Incidents, Unassigned Problems, All Open Problems and Events without Incidents.

You also have the ability to create your own views of the incidents in the left side navigation pane. Any views you create are specific to your EM account, and are not available to other EM users.

You can manage an incident by choosing the view, All Open Incidents, clicking on the General tab and then choosing “Manage”. From the Manage Dialog you can change the status, assign the incident to an EM user for follow up, change the priority, escalate the incident and add comments that other users can see.

If you wish to work on an incident (and thereby assign yourself as the owner of the incident), on the General tab click the link “Acknowledge”, which will mark the incident as being acknowledged. From here you can look at the My Oracle Support Knowledge for additional information (and to be able to even issue an SR to Oracle for further assistance) or use the Guided Resolution if you would like to investigate further without necessarily contacting Oracle Support.

You can also suppress further messages or notifications from being processed on an incident, which you may want to do if it’s currently being worked on, but it’s not fully resolved yet. On the General tab, click on the “More” option and then choose “Suppress”.

Generally once the underlying problem or cause of an incident is fixed the next time it’s evaluated it will automatically be marked as cleared, however you can also manually clear outstanding events once they are resolved.

Not all events that happen in a database automatically cause an Incident to be logged. We create rules to have event(s) actually generate an incident to be managed. Building up the rules takes time, and so we have the option of manually generating an incident for tracking. There is one view “Events Without Incidents” that can be particularly useful here. It is good practice to periodically look at this view, and if at one point you do see an event that is significant, and you would like to actually track it as an “official” incident, select it, and from the General tab, select “More” and then Create Incident. You can assign an Incident #, assign it for tracking and change the status to “Work in Progress” and then you have the full functionality of Incident Manager to continue to monitor and track work on the incident. And at a later time, you can always generate a rule that would allow that event (or events) to automatically generate an incident in the future.

Rules

EM 12c Cloud control uses a combination of rules and rule sets to govern what action or actions should be taken when an event or incident occurs on a managed target.

Rules are essentially a set of directions for EM to take, such as sending an email, generating an incident automatically if specified events happen.

Rules can do any of the following actions:

  • Create an incident
  • Send a notification like sending an email or generating a help desk ticket
  • Perform incident management actions (such as automatically escalating an alert if it’s not worked by its assigned administrator in a specified period of time)

Rules consist of two parts – the event/incident or problem that it applies to and the action that should take place.

Rules are processed in the order that they are created or entered into a rule set by default so this must be taken into account when creating the rules and rule sets.

Let’s say we want to generate in incident based on a combination of CPU and Memory metrics (perhaps that indicates a high system load). Then based on whether it’s a warning or critical level we want to send a page or email, and lastly, if the incident is not closed within 3 days it should be escalated to a level 1 – then this is how the rules should be created

Criteria Condition Action

Rule 1 CPU Util(%) Metric Create Incident

Memory Util(%) Metric

Events – warning or

Critical

Rule 2 Second incident of Severity=Critical Notify by Page

Warning/Critical severity Severity=Warning Notify by email

Rule 3 Incident open for 3 days Set escalation level

to 1

Rule Sets

One or more rules that apply to a target or collection of targets can be grouped into rule sets. We use rule sets to help organize rules into manageable units. For example, a rule set could be created that would be applied to production systems, a second rule set for development systems etc. Or, a rule set for database targets, host targets, etc.

Enterprise rule sets can be used across the enterprise, and they can perform all supported actions. The ability to create enterprise rule sets is restricted. When an action is done by a rule set, that action is done based on the privileges of the rule set creator.

Private rule sets can be created by any administrator in order for that administrator to set up notifications about any of their targets. The only action these rule sets can take is to email the rule set owner.

Rule sets include the following: a name, description, what it applies to, owner, status (enabled or not) and type (public/enterprise or private).

There are several rule sets that are created and activated automatically once EM 12c Cloud Control is installed. These built in rule sets will

  • Create an Incident if a target goes down
  • Create an incident for an agent unreachable error
  • Create an incident for any critical metric alerts
  • Create an incident for any service level agreement alerts
  • Create an incident for compliance score violations
  • Create an incident for any high-availability events.
  • Automatically clear metric alerts older than 7 days
  • Automatically clear job status change events older than 7 days
  • Automatically clear Application Dependency and Performance (ADP) alerts after 7 days

These rule sets cannot be modified or deleted, however they can be disabled or enabled.

Next month we’ll be looking at the detailed how-to of creating rules and rule sets in Oracle Enterprise Manger 12c Cloud Control. Until then…

See all articles by Karen Reliford

Karen Reliford
Karen Reliford
Karen Reliford is an IT professional who has been in the industry for over 25 years. Karen's experience ranges from programming, to database administration, to Information Systems Auditing, to consulting and now primarily to sharing her knowledge as an Oracle Certified Instructor in the Oracle University Partner Network. Karen currently works for TransAmerica Training Management, one of the foremost Oracle Authorized Education Centers (OAEC) in the Oracle University North America region. TransAmerica Training Management offers official Oracle and Peoplesoft Training in Coral Gables FL, Fayetteville AR, Albuquerque NM, Providence RI and San Juan PR. Karen has now been teaching Oracle for Oracle University for more than 15 years. Karen has attained her Certified Technical Trainer designation along with several Oracle certifications including OCP-DBA, OCP-Internet Developer, Oracle Expert - Oracle 10g RAC and Oracle Expert - Oracle Application Express (3.2). Additionally, Karen achieved her Oracle 10g Oracle Certified Master (OCM) in 2008. Karen was raised in Canada, and in November 2009 became a US Citizen. Karen resides in Columbus OH with her husband, Ron along with their 20 pets, affectionately referred to as the "Reliford Zoo".

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles