Monitoring target systems is a critical responsibility of today’s database and system administrators. DBAs and SAs are responsible to ensure our systems are available, that their performance is within acceptable parameters, and watching out for any kind of error conditions and/or unusual behavior patterns.
Oracle has made great strides in providing tools and techniques for administrators to be able to take steps to proactively monitor their systems for all of the above conditions, and then some.
This trend definitely continues with the new feature in Oracle Enterprise Manger 12c Cloud Control called Incident Manager.
Oracle Enterprise Manager 12c Cloud Control – Incident Manager – Part Four
This is the final article in my series on the great new feature in Oracle Enterprise Manger 12c Cloud Control – Incident Manager. In this last article we’re going to look at how to create rules and rule sets and how they work together with using notifications.
Creating Rules and Rule Sets
Rules are the starting point for really controlling what and how Incident Manager monitors and responds to any kind of issues or errors that arise. To create a rule, these are the basic steps:
Navigate to Setup> Incidents> Rules
From here you can edit an existing rule set or create a new one. Rules are added to rule sets. To edit an existing rule set, highlight the rule set and select Edit. To create a new one, click Create.
You’ll have to decide what type of target the rule set applies to, and optionally you can select “All,” which means it could be associated with any applicable target.
The first step is to choose the type of rule set, and you have three choices – Event, Incident, Problem. Click on continue, and then it’s a matter of simply following the wizard and saving the rule into the rule set.
Using a Rule to Generate an Incident
Let’s say you want to create a rule that will generate an incident to be tracked in Incident Manager – and it’s going to be based on an event such as a Transactions Per Second metric alert.
Edit or create the rule set. Once in the rule set, select the “Rules” tab and click Create.
Because this would most likely be the first action to take, choose the option “Incoming Events and Updates to Events”, and then choose Continue.
For the Event Type, choose Metric Alert. Then choose the specific metric you are basing the rule on. Click on the +Add button to see a list of available metrics by target. On this page, you will also see the options for defining a severity level and corrective action option. Once you have selected the metric, click Next.
Your next step is to add the action that should be taken, and Create Incident is one of the available actions. You can assign the Incident to a user, set the priority and even log a ticket as you set the action.
Then click Next to move onto the final step in the wizard, which is to add a name and description for the rule.
Using Rules to Escalate an Incident
Once an Incident is generated, another rule that we may want in place would be to escalate the incident if it is not addressed quickly, for example, we may expect that an incident is at least acknowledged within 24 hours, in particular if it is a severity of fatal. If this does not happen, the incident is going to be escalated to the IT Manager. If it goes 48 (total) hours it will be further escalated to the IT Director and if it’s 72 (total) hours it goes to the VP. We’ll use escalation levels to help control this, Escalation Level 1 for IT Manager, Escalation Level 2 for IT Director and 3 for VP.
In this case, when you click Create on the Rules Tab (after editing an existing rule set or creating a new one as appropriate), we are going to be using an incident as the basis for our rule, we need to make a different initial selection, so rather than “Incoming Events…” be sure to choose “Newly Created Incidents or Updates to Incidents”.
In the next wizard step you can either select specific incidents, or select all incidents where the severity = fatal if you want more of a blanket rule.
We want the action to happen based on a specific condition. In the Conditions for Actions region select “Execute the actions on the conditions specified” then select “How long the incident is open and in a particular state”.
- Set the time to 24 hours
- Set the Status to be equal to Open
- In the Basic Notification Region, select the administrator to be notified and the method such as email.
- Under Update Incident, choose the option to Escalate To Level 1
To create the next escalation – add another rule, similar to above with the following options:
- Set the time to 48 hours
- Set the Status to be equal to Open
- Set the Escalation Level equal to Level 1
- Add the administrator email information for the notification
- Choose the option to Escalate to Level 2
The final escalation would be
- Set the time to72 hours
- Set the Status to be equal to Open
- Set the Escalation Level to 2
- Add the appropriate administrator account to be notified
- Escalate to Level 3
By the way, more than one account can be listed as the recipient, which means, the rules could be set up to copy the DBA for any incidents that do get escalated (or any other administrator that might need/want to see the notification)
Once all of the rules are created, review the rule set to ensure that they have all been added in the correct order.
Rule sets and rules can be created in the same manner to work on problems, track job status changes, and other notifications in addition to working with events and incidents making them very robust. For example, a DBA can create a private rule set that would send them a notification when a scheduled backup job gets a status of failed (or completed if desired) by creating an Event rule on the event “Job Status Change”.
Advanced Options
There are also many advanced options that are available for rules and rule sets including:
- Separate Notifications Based on Severity States – for example, send a page to a DBA for a critical metric alert and an e-mail for a warning level alert on the same metric. These could go to the same DBA or different DBA’s
- Rule Sets To Generate Tickets – rules can be configured to automatically send a ticket to a help desk tool such as Remedy (naturally this would have to be fully configured to Cloud Control first)
- Rules to Notify Different DBAs Based On Event Type – in an organization that has application DBAs, database DBAs, rules can be defined that events tied to general system settings go to the database DBA while events that are application specific can be directed to the appropriate application DBA.
For more details on the advanced options, Oracle has provided a very detailed documentation guide that can be found here.
In conclusion
The new Incident Manager feature of Oracle Enterprise Manger 12c Cloud Control is a feature rich tool that will enable DBAs to track, manage, research and escalate (as necessary) virtually any sort of hiccup that their database(s) and related targets might throw at them.
Definitely a feature worth knowing.