As mentioned in my last article, Oracle has made great strides in providing tools and techniques for administrators to be able to take steps to proactively monitor their systems and one of the new features that has been introduced to help monitor (and in some cases streamline the monitoring) is Incident Manager.
The key pieces that Incident Manager is based on are events, incidents, problems, incident rules and incident rule sets.
In this article, we’re going to begin with taking a closer look at Events. An event is something that happens to a target being monitored by 12c Cloud Control. For example, an event could be an alert being issued because of a threshold being crossed in one of the metrics configured on a database.
Every event has a set of attributes associated with it, and these include the event type, the severity, an internal name used to help identify the event, the item that the event occurred on, a message indicating what the event was all about, a timestamp and a category.
The categories of events are:
- Availability
- Business
- Capacity
- Configuration
- Diagnostics
- Error
- Fault
- Jobs
- Load
- Performance
- Security
The severity levels of events are
- Fatal
- Critical
- Warning
- Advisory
- Informational
- Clear
Any event that is raised that falls into the Fatal or Critical severity levels will automatically cause an incident to be generated.
The event types include:
- Target Availabilty
- Metric Alert
- Metric Evaluation Error (when a metric cannot be measured)
- Job Status Change
- Compliance Standard Rule Violation (a feature that is replacing policy violations in previous versions of Oracle)
- Compliance Standard Score Violation
- High Availability
- Service Level Agreement Alert
- User Reported (we can generate our own events)
- Application Dependency and Performance
- Application Performance Management KPI Alert
- JVM Diagnostics Threshold Violation
Another great feature about Incident Manager is the ability to prioritize events. The priority is based on the target’s Lifecycle Status and then the Event Type. This is used only under very heavy workload situations within EM. Essentially EM will handle (or raise) the incidents/events based on the Lifecycle Status and Event Type. Under normal workloads, the events/incidents are managed as they arise.
There are five possible values for the Lifecycle Status of a target. These are (from highest to lowest priority)
- Mission Critical
- Production
- Stage
- Test
- Development
In order to set the Lifecycle Status, navigate to the Target, the Target Setup menu, and then select Properties (or for the command line gurus of the world, use the EM CLI set_target_property_value command).
The Event Types are grouped together into the following three categories when it comes to prioritizing, which are
- Availability Events
- Non-Informational Events
- Informational Events
Ultimately, we leverage the events that are generated so that they get reported as incidents to the Cloud Control Incident manager. We determine which events (or combination of events) need to be reported. In this way, we can make sure that we are using 12c Cloud Control to track and monitor what is most important to us. And, we are able to only escalate what makes sense. The ability to send notifications directly based on events (as we have in the past) is still part of Cloud Control, however, the intention is to reduce the “clutter” and generate our notifications at the higher incident level.
For the most part, we do not define events per se, events are pre-defined. The exception would be if we wanted to create a User Reported event. This can be done using EM CLI.
emcli publish_event -target_name="name" -target_type="internal name" -message="your message" -severity="level" -name="event name" [-key="component name"] [-context="nameA=valueA;nameB=valueb;..."] [separator=context="alternate pair separator"] [subseparator=context=" alternate name-value separator]
The message cannot exceed 4000 characters, the name cannot exceed 128 characters and should indicate the nature of the event. The severity must be one of the following values
- CLEAR
- MINOR_WARNING
- WARNING
- CRITICAL
- FATAL
The key attribute is optional, and it would be used to name a specific item within the target. For example, to include the name of a tablespace along with the database name, you would set the target name to the database name, and then use a key attribute to display the name of the tablespace in that database. The key cannot be longer than 256 characters.
While it is recommended that events are managed at an incident level rather than on an individual event level, the fact does remain that there are literally hundreds of events that can be generated. Trying to get them all mapped to incidents is an activity that will be ongoing – and there is a definite chance that there will be events that will happen that are not yet mapped out to incidents for reporting.
Through Incident Manager, 12c Cloud Control gives us the ability to view events that have occurred that are not yet configured to any particular Incident for reporting and management.
To see these events, first go to the Enterprise menu from the EM home page, then select Monitoring and Incident Manager. On the left hand side (in the Views region) there will be a link “Events without incidents”.
Any events that have happened that are not tied to incidents will be displayed. Simply click on the event to view the details for the event.
After reviewing the event details an event can quickly be added to either an existing Incident (choose More, then Add Event to Incident) or the event can be used to create a new Incident (choose More, Create Incident).
Once we have an understanding of the events that are used as the basis for defining incidents (which is ultimately how things are reported to Oracle 12c Cloud Control), the next step is to look at how to define and work with Incidents. And that, is the topic for next month…