We live in real time and often must deal with performance issues in real time too. For many, Oracle Enterprise Manger (OEM) is often a nice place to begin.
As a DBA, you typically want to use Oracle’s automatic diagnostic feature of Automatic Database Diagnostic (ADDM) to zero in on performance problems that occur. Unfortunately, we live in real time and often must deal with performance issues in real time as well, saving future idle time to deal with those performance problems that have reared their ugly head during diagnosis but are not necessarily occurring right now. For the here-and-now, real-time performance problems, we need a tool that will allow us to monitor database performance in real time and show us where we need to focus our attention.
For many, Oracle Enterprise Manger (OEM) is often a nice place to begin. The performance page presented in OEM gives three distinct sections that one can look at and get an overall performance picture of the database in real time and then can subsequently drill down into other pages to get at root causes of performance problems. Then, when needed to further the investigation, DBAs can manually run ADDM to analyze immediately instead of waiting for the next scheduled Automatic Workload Repository (AWR) snapshot to run.
One of the niceties of the performance page is being able to set a refresh rate, refresh on demand, and view the data behind the graphs. This screen shows at a glance CPU utilization at the top under the “Host: Runnable Processes” graph, a breakdown of the “Average Active Session” types, and another graph showing the instance throughput rate, selectable by per second or per transaction, for transactions and physical read and redo activity.
This article will, as prescribed by the Oracle Database 2 Day + Performance Tuning Guide, take a look at how we can use OEM as a jumping off point to drill down and find root causes for performance problems in real-time, specifically looking at monitoring user activity, monitoring instance activity, and monitoring host activity. Each of these together brings a nice and total view of those things that can cause database performance issues.
Monitoring User Activity: Monitoring user activity, from the performance page, is nothing more than taking a look at the active session over a period of time and drilling down for those sessions that are of interest, either those running on the CPU or waiting on an event.
The choice clearly is determined by actual activity on your system. If everything was running on the CPU then you’d want to investigate if this was healthy. If everything was waiting for a resource you’d want to investigate what they were waiting on and why, drilling down from the chart to identify the reason for the performance issue. The idea here, from OEM’s performance page and the user activity graph, is to locate the sudden increases, or constant high values, for activity in the graph.
Areas of interest often are those wait classes that have the largest block of color showing. You can then just hover over those sections, click, and drill down into the wait class. After clicking on a wait class another screen will be shown that shows the “working” page for that particular wait class, giving a great deal of information broken down for that particular wait class. For instance, if you were to click on the CPU Used, then the active working page would be for active sessions, showing detail that includes top SQL and Session information. From here, additional drill down is often available such as, in this case, SQL Tuning Advisor for SQL, or additional detail options that allow you to view active session information by top SQL, sessions, services, modules, actions, clients, PL/SQL, files, or objects. The key here is to continually search and drill down for those active sessions that are consuming too many resources, waiting for resources, or just have the most activity. And the ultimate goal is to reduce the time and/or activity of these sessions. Please see the many graphs within the Oracle Database 2 Day Performance Tuning Guide, and be sure to try OEM as well.
Monitoring instance activity: Monitor instance activity through the instance chart/graph to monitor throughput, I/O, parallel execution, and services. Throughput of a database is nothing more than the amount of work that the database has performed over a specified unit of time, where logons, transactions, reads, and redo activity are great indicators. For throughput, it is advantageous to compare the peaks, valleys, and constants over a time period to the amount of contention (waiting) your database is doing. For instance, if there is a high level of throughput, with or without contention, then you are probably doing ok. Just so long as everything else is already tuned. But for situations where there is a low level of throughput and high levels of contention (waiting), this would indicate a clear need for tuning the database. Monitoring I/O within the instance activity is a great indicator of how well the database is getting data to satisfy requests that will support the desired throughput. If you see excessive waits for I/O or high latencies, then you know that the database could be doing more work (throughput) if the data could get there quicker. There is nothing worse than wanting to get more work out of your database but having slow or contentious disks that just can’t do any more work.
Monitoring Host Activity: Monitoring host activity is important because it contains information about the system that is hosting the database. If there isn’t enough memory, disk or CPU, then this is where it will show up. Monitoring host activity will tell you if there are sufficient levels of disk, memory and CPU to satisfy the required resource needs for the workload of the database. To see this detail, just click on load link from the Database home page under Host CPU. From here a performance subpage is shown that provides detail of the CPU, Memory, and Disk I/O utilization over time, giving clear indicators as to when high levels of activity/host resource usage have occurred. It is very important that you understand if the levels experienced are normal or your system may require additional resources. Validating baselines when there is limited or no activity, when there is average activity, and when there is very high activity can prove beneficial and help you determine when performance issues may occur.
Oracle’s database performance page within Oracle Enterprise Manager has proven itself very useful in a first line of attack when needing to quickly gain insight into real-time performance problems. Sure it takes a little time getting used to, but the click-through abilities into distinct areas of system and database resource consumption make it a very viable tool. The major concept here is to just use the tool and explore the many drill-downs BEFORE you actually need to use it. Getting yourself accustomed to its use is critical to determining and finding real-time performance issues.