involved in application performance tuning, the database administrator usually
concentrates on reducing elapsed times or I/O waits. Too often, the CPU is
forgotten, perhaps because of recent advances in CPU speeds, parallelism, and
disregarding CPU costs can lead to resource contention, wasted CPU cycles, and
more frequent hardware upgrades. The DBA should consider CPU usage during
application tuning opportunities, and attempt to balance resource usages (CPU,
memory, I/O, and so forth) across the system to minimize resource constraints.
Good or bad tuning?
Situation #1: After
analysis of accounting records, a DBA determines that the RunStats utility is
responsible for almost 2% of the CPU used in one LPAR. When 90% of these jobs
are removed, weekly CPU usage is significantly reduced.
Situation #2: The DBA
reviews statistics for a long-running application. It contains an SQL statement
that joins two partitioned tables; however, the access paths do not take
advantage of parallelism. By implementing parallelism, the DBA cuts total
execution time of the job from two hours to two minutes.
Situation #3: A popular online
transaction typically takes several minutes to execute. After some analysis,
the DBA creates three new indexes on tables that are joined in the application.
Response time for the transaction now averages less than a second.
What do these situations have
in common? All of them may be examples of DB2 CPU tuning failures! While on the
surface they may appear to be success stories, the deeper reality was this: in
each case, the DBA made a tuning decision based on a local symptom or problem
without considering the CPU implications.
How tuning decisions affect CPU usage
In the first scenario, the
DBA eliminated many RunStats jobs. For high-volume critical applications,
up-to-date statistics may be essential in order for the DB2 optimizer to choose
efficient access paths for SQL statements. This is especially important if
distributed applications are using dynamic SQL.
A better choice would have
been to collect access path information and data volatility statistics to
determine which stats are essential. Typically, these are index clustering and
key distribution statistics. So, rather than removing RunStats jobs, the
correct procedure would probably involve re-coding parameters to focus on
gathering critical statistics.
In the second scenario,
the DBA greatly reduced one application’s elapsed time; however, during job
execution it is possible that overall system CPU utilization will be much
higher. This happens because the CPU usage that was once spread over two hours
is now compressed into a much shorter interval.
This has the effect of
"spiking" CPU usage for that period. This application now runs much
faster, but at the expense of every other application that requires CPU.
In the third scenario, the
DBA added indexes to tables without considering the overall impact on the
system. Other applications that insert or delete rows now must insert or delete
index entries. Daily table loads now run longer as well due to the need for
sorting entries for the new indexes. All this adds up to increased CPU usage.
Let’s take a look at a
tuning opportunity where CPU usage was the primary consideration.
Case Study: CPU starvation
This was a non-data
sharing environment in which the total machine CPU time hovered near 100% busy
during the production day. The DBAs analyzed the DB2PM Accounting Detail
reports to determine whether excessive CPU could be attributed to DB2, to
certain applications, to access paths for certain objects, or to something
For application CPU
resource consumption, the DBAs developed a "CPU time per SQL statement"
measurement. They used the In-DB2 TCB time from the Accounting Detail report as
their CPU measure, and created a weighted average measure of the number of SQL
statements executed by the application.
Their exact formula is not
important here. (For example, they researched the average CPU usage required
for various DML statements based on the DB2 V8 Performance Topics Redbook, and
then modified this to suit their applications.) What is important is that they
created a generic measurement for application CPU usage that they could then
use to find "CPU hog" programs.
For each application, they
divided the In-DB2 TCB time by the total number of SQL statements executed.
This resulted in numbers in the range of 0.1 to 1.0 milliseconds (msec) per
statement. Those applications scoring higher than 1.0 msec were considered to
be high CPU consumers.
The next step was to
determine what other resources were available. In this case, real memory on the
box where DB2 was running was not 100% utilized. In consultation with the
systems programming staff, the DBAs were able to get an additional gigabyte of
real memory dedicated to DB2. This was implemented in the form of several
dataspaces, and several of the most heavily-used tables were allocated to
virtual pools in dataspaces.
The result: an overall
decrease in CPU utilization, due to fewer required physical I/Os to access the
critical tables. The tradeoff: additional memory usage.
Next, the DBA team
analyzed the accounting detail report looking for DB2 subsystem issues. They
noticed that the In-DB2 Not Accounted For Time was relatively high (about five
percent of In-DB2 Elapsed Time). This indicated that the DB2 address space was
waiting for CPU dispatching. This happens most often when the CPU is extremely
busy, as DB2 must wait for a CPU to do some synchronous tasks such as log
writes. If DB2 must wait for a CPU, then the application waits for DB2.
In this case, reduction of
overall CPU time was called for, as this problem tended to occur during times
of high CPU busy. The DBA team reviewed their overall resource profile and
determined that they had some unused DASD resources; in addition, they could
afford to sacrifice throughput for some low-priority applications. They then
implemented the following changes:
Many small and medium-sized tables (under 100,000 rows) that were
defined with data compression (COMPRESS YES) were re-defined to remove the
compression option. They also concurrently increased the sizes of some of the
virtual pools. This increased DASD usage somewhat, and also caused a slight
increase in CPU due to the additional I/Os required by SQL statements that
accessed large numbers of rows. However, there was a net decrease in CPU time
by forgoing compression and decompression of rows for those tables.
Many distributed applications (i.e., those not originating in
z/OS) were invoking DB2 stored procedures. In many cases, these stored
procedures were being used to execute a single SQL statement that returned one
row. The overhead for invoking a stored procedure is several thousand machine
instructions; so, for some highly-used stored procedures the DBAs had
developers use dynamic SQL instead. In addition, they enlarged the dynamic
statement caches. This caused a slight increase in CPU time as dynamic SQL
needed to be bound (unless it was in the cache); however, there was a net
reduction in overall CPU usage as hundreds of thousands of stored procedure
invocations were removed.
The frequency of DB2 system checkpoints was changed from every 5
minutes to every 12 minutes. This reduced CPU usage somewhat, as DB2 did fewer
checkpoints. However, this had the effect of increasing recovery time in the
event of a system outage.
Reducing an application’s
elapsed time or simply adding indexes should not be a substitute for good
planning or standard performance tuning methods. You should use a strategy
where you understand system-wide resource usage and develop application tuning
methods that take into account the system as a whole.