Constantly watching CPU utilization is critical to a database administrator’s understanding of their database system. The sar utility, while seemingly small and insignificant can and should be used to help initially determine if any high-level CPU resource usage is out of balance, potentially signaling a database performance tuning problem.
Being an effective database administrator means that we must at times be able to use system level utilities to find things like CPU, memory, and I/O bottlenecks/issues to help point us in the proper direction.
So often, I see database administrators diving straight into the database to determine performance problems, forgetting about the operating system and the wealth of information it contains. Being an effective DBA means that we must at times be able to use system level utilities to find things like CPU, memory, and I/O bottlenecks/issues to help point us in the proper direction. These three high-level resources are the main limiting factors to database performance that cannot be altered; we have only so many CPU cycles, so much memory, and a specific disk configurations. Without properly understanding how they are currently behaving, before looking inside the database, will almost always gets us in trouble. This article looks at how we as DBAs might monitor CPU utilization from the operating system and when it might signal issues we need to look into.
The sar utility is part of the sysstat package and will collect, store in logs, and report on a wide variety of system activities. Of all the system level commands that you can use, the sar utility has multiple switches/options that allow the DBA to quickly and easily monitor those resources (CPU, context switches, interrupts, paging, memory usage, buffer usage, network usage) that can give us quick insight into the health of our database systems. What makes sar very nice to use is that sar constantly collects and logs system activity. To see this, and get some rudimentary, but powerful, commands out of the way, the following examples help show how to call sar for days and time intervals; remembering that sar maintains daily logs for each day in a file such as /var/log/sa/sa<dd>, where <dd> is the day of the month:
Use sar to look at today’s collected activity:
[oracle@wagner ~]$ sar
Use sar to look at today’s collected activity; reporting every 10 seconds for 15 times
[oracle@wagner ~]$ sar 10 15
Use sar to look at the daily activity on the 13th:
[oracle@wagner ~]$ sar -f /var/log/sa/sa13
Use sar to look at today’s collected activity between 6:00am and 7:00am
[oracle@wagner ~]$ sar -s 06:00:00 -e 07:00:00
Use sar to look at the daily activity on the 13th and between 6:00am and 7:00am
[oracle@wagner ~]$ sar -s 06:00:00 -e 07:00:00 -f /var/log/sa/sa13
For the above examples, the following options were used:
-f : is the filename to be used in the report
-s : is the starting period of the report
-e : is the ending period of the report
Now that we have some of the preliminaries out of the way, finding and looking at report data with sar to help us tune a database really is all about looking at that system level resource and then diving into the specific database engine, where possible, to see what might be consuming that resource. Granted, there may be ill-use of the resource that doesn’t necessarily show up at this high level but what looking at the resource from the system level does for us is give us a good picture as to whether the resource is limiting our database from performing optimally.
Of the most basic options to the sar command, the –u option lets us report on CPU utilization. Combining the above examples to look at the CPU utilization over a given time period is critical for zeroing in on when a system, at least from a CPU perspective, is having issues, might be overloaded, or, under better circumstances, will allow you to schedule additional workload through the system at low periods. A sample output, on a VERY idle system, might look like the following:
[oracle@wagner ~]$ sar -u -s 15:00:00 -e 16:00:00 -f /var/log/sa/sa13 Linux 2.6.18-194.3.1.el5 (wagner) 07/13/2010 03:00:01 PM CPU %user %nice %system %iowait %steal %idle 03:10:01 PM all 1.04 0.00 3.97 0.05 0.00 94.94 03:20:01 PM all 0.77 0.00 2.95 0.00 0.00 96.27 03:30:01 PM all 0.72 0.11 3.29 0.01 0.00 95.88 03:40:01 PM all 0.73 0.00 2.91 0.01 0.00 96.36 03:50:01 PM all 0.71 0.00 2.89 0.01 0.00 96.39 Average: all 0.79 0.02 3.20 0.02 0.00 95.97
Obviously, this system is very idle from a CPU perspective during this time period.
Often times we are asked how the system is currently performing and issuing the sar command to take a look at the CPU could easily produce the following output; showing a clear indication that our CPU is overtaxed.
[oracle@wagner ~]$ sar -u 5 30 Linux 2.6.18-194.3.1.el5 (wagner) 07/13/2010 08:31:07 PM CPU %user %nice %system %iowait %steal %idle 08:31:12 PM all 0.40 0.00 1.40 0.00 0.00 98.20 08:31:17 PM all 10.58 0.00 31.54 1.80 0.00 56.09 08:31:22 PM all 11.98 0.00 88.02 0.00 0.00 0.00 08:31:27 PM all 13.80 0.00 86.20 0.00 0.00 0.00 08:31:32 PM all 12.35 0.00 87.65 0.00 0.00 0.00 08:31:37 PM all 11.38 0.00 88.62 0.00 0.00 0.00
From this point, it is a good idea to ask ourselves what is obviously happening on the system that would cause the system to go from 98.20 %idle to 0.00 %idle. From the above, it should be noted that we are also not waiting for any I/O as the %iowait is basically 0.00. Actually, the issue here is either the CPU running in system mode or the CPU running in user mode. The key here is to look at the processes that are consuming CPU resources and determine if there is anything we can do about it. Issuing a ps command such as the following, or using the top command, can quickly get us to the processes on the system that are consuming CPU resources.
[root@wagner ~]# ps -e -o pcpu,pid,user,args | sort -r | head %CPU PID USER COMMAND 8.1 6235 oracle sqlplus as sysdba 1.9 2862 root /usr/bin/Xorg :0 -br -audit 0 -auth /var/gdm/:0.Xauth -nolisten tcp vt7 0.8 3325 oracle ora_vktm_vm11r2 0.3 6260 root top 0.3 3226 root gnome-terminal 0.1 6236 oracle oraclevm11r2 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) 0.1 3345 oracle ora_ckpt_vm11r2 0.1 3341 oracle ora_dbw0_vm11r2 0.1 3337 oracle ora_dia0_vm11r2
Clearly, the issue here is that someone has used SQLPlus to log into the database and issue some commands that are consuming a majority of the CPU resources. From here, we leave the operating system, log into the database (Oracle), and begin to investigate the appropriate objects to extract what this particular user is doing. Here is a quick SQL that can get you to that process, user, and SQL:
SQL> select sesion.process, sesion.username, sesion.osuser, sqlarea.sql_text from v$sqlarea sqlarea, v$session sesion where sesion.sql_hash_value = sqlarea.hash_value and sesion.sql_address = sqlarea.address and sesion.process = 6235; PROCESS USERNAME OSUSER SQL_TEXT -------- -------- -------- -------------------------------------------------- 6235 SYS oracle select a.*, b.* from dba_tables a, dba_tab_columns b
Obviously, this DBA doesn’t know what he/she is doing as they are producing a Cartesian product and really consuming CPU resources to get the result set. Maybe it’s time to talk to them and see exactly what they are trying to accomplish and then kill the process, rewrite the SQL, and hopefully go on to the next bottleneck. Not shown in the query, but very valuable, in the V$SQLAREA view there are two columns, CPU_TIME and ELAPSED_TIME that provide the microseconds of CPU time and elapsed time used by the SQL respectively. Together, with the assumed problem noticed by using sar should help you pinpoint CPU resource intensive processes. I say assumed here because we don’t want to spend our time messing with too many SQL statements that spike the CPU occasionally when they may only consume 5-10 microseconds of the CPU total. Now if these SQL statements are executed continually throughout the day then that would be another story and research is warranted.
Other things to think about / look at when using the sar command would be, as is the case with my system, is a low %idle (CPU totally being consumed) would/might indicate an under-powered CPU, not just the poorly written SQL statement. Always look at the %wio (percent of waiting on I/O) as this would/could indicate a SQL statement(s) doing too may I/Os and an opportunity to tune the SQL, add/remove indexes, or modify applications.
Because CPU utilization is one of the key indicators on the system that signals good or bad performance, constantly watching it for peaks and valleys is critical to a DBA’s understanding of their database system. The sar utility, while seemingly small and insignificant can and should be used to help initially determine if any high-level CPU resource usage is out of balance, potentially pointing to database processing that requires tuning.