More on Using Tools for Tracing | Database Journal

More on Using Tools for Tracing

Written By
Steve Callan
Steve Callan
Jan 9, 2008
7 minute read

As mentioned in a
previous article
, there is more than one approach or tool to use in a
“tuning opportunity.” Ideally, the opportunity occurs before code has gone into
production, but as we’re all aware of, a lot of time spent on tuning code takes
places in production. Context is always important to keep in mind. What may
look good may in fact be not so good, and vice versa.

Tracing and TKPROF

Shown below is the TKPROF’d
output from three SQL statements. What can you deduce about the statements,
that is, were these DML (insert, update, delete) or select(s)? In the formatted
output from the trace file used to create this report, you’d obviously have the
statements right there, so no guess work is involved at all. However, that’s
not the point of this question. The point is to separate what you see (or read)
versus what you understand. The data set used throughout is based on the SH
demo schema in 10gR2. The code used to generate the output will be shown later.

Some questions about the
output:

  • What is the general nature/type
    of each statement?
  • What parameter is used to enable
    timing information, what is it set to, and when set, which column does it
    populate?
call    count     cpu   elapsed       disk     query   current      rows
——- —–  —— ——— ———- ——— ———  ——–
Parse        1    0.06     0.06          0         0         0         0
Execute      1    0.00     0.00          0         0         0         0
Fetch       92    7.67    15.92      15645     24160        11    918843
——- —–  —— ——— ———- ——— ———  ——–
total       94    7.73    15.99      15645     24160        11    918843
call     count    cpu   elapsed       disk     query   current      rows
——- —–  —— ——— ———- ——— ———  ——–
Parse        1    0.00     0.00          0         0         0         0
Execute      1    0.00     0.00          0         0         0         0
Fetch     9189    3.78    14.06      15214     12741        11    918843
——- —–  —— ——— ———- ——— ———  ——–
total     9191    3.78    14.06      15214     12741        11    918843
call     count    cpu   elapsed       disk     query   current      rows
——- —–  —— ——— ———- ——— ———  ——–
Parse        1    0.00     0.00          0         0         0         0
Execute     92   48.89    88.66       7619   1799287   3607477    918843
Fetch        0    0.00     0.00          0         0         0         0
——- —–  —— ——— ———- ——— ———  ——–
total       93   48.89    88.66       7619   1799287   3607477    918843

Table 1 – TKPROF formatted trace file output

The top two rows both
represent select statements. Each statement was parsed once (the shared_pool
was flushed ahead of time) and executed once. The number of rows fetched is
just over 918,000, and is the same for both. In this case, the number of rows
returned should be the same since the same statement was used in both examples.

We get cpu time for free,
and elapsed time is present (has values) when TIMED_STATISTICS is set to true. Given
that the two statements were the same, why would there be such a difference in
the fetch count?

Discounting the difference
between the elapsed times (they’re fairly close anyway, and the reason for the
difference is also forthcoming), and using the same select statement, we know
something was different in terms of how the select was performed. The
table below may be enough of a clue as to the reason why.

Rows/group size

Results

918843/10000

91 groups of 10000 with
remainder 8843

918843/100

9188 groups of 100 with
remainder 43

Table 2 – Rows versus fetch counts

In the top row, it would
take 92 fetches to account for 918843 rows. In the bottom row, it is 9188 plus
one more, or 9189. The group sizes are arrays, so what was done in the first
operation caused/enabled Oracle to fetch the rows 10000 at a time. Likewise,
something caused Oracle to fetch only 100 at a time for the second version.
Time to see what the code looks like.

The select statement (within
the procedures shown) below simply gets min, max, sum and count based on the
product ID in the SALES and PRODUCTS tables. I added several extra columns to
the SALES table, and these are used for the update. If you want to replicate
this on your database (assuming you have the demo
schemas installed
), modify the SALES table as so:

alter table sales add (
days_sold number,
number_sold number,
first_sold_date date,
last_sold_date date,
last_updated date);

When finished, drop the
columns and the schema is back to where it was (no need to reset
the data
).

create or replace procedure update_sales_fact
is
   type t_rowid is table of rowid;
   type t_nos   is table of sales.quantity_sold%type;
   type t_qs    is table of sales.quantity_sold%type;
   type t_fsd   is table of sales.time_id%type;
   type t_lsd   is table of sales.time_id%type;
   v_rowid       t_rowid;
   v_nos         t_nos;
   v_qs          t_qs;
   v_fsd         t_fsd;
   v_lsd         t_lsd;
   v_date        date := sysdate;
   cursor c is
   select s.rowid
        , sum(s.quantity_sold) number_sold
        , count(distinct s.time_id) days_sold
        , min(s.time_id) first_sold_date
        , max(s.time_id) last_sold_date
     from sales s
        , products p
    where s.time_id > ’31-DEC-97and s.prod_id = p.prod_id
    group by s.rowid;
begin
   open c;
   loop
      fetch c
         bulk collect into v_rowid, v_nos,
                           v_qs, v_fsd, v_lsd
         limit 10000;
      begin
         forall i in v_rowid.first..v_rowid.last
            update sales
               set days_sold       = v_qs(i)
                 , number_sold     = v_nos(i)
                 , first_sold_date = v_fsd(i)
                 , last_sold_date  = v_lsd(i)
                 , last_updated    = v_date
            where rowid = v_rowid(i);
      end;
      exit when c%notfound;
   end loop;
close c;
end;
/

The second use of the same select
statement:

create or replace procedure update_sales_fact2
is
   v_date date := sysdate;
   cursor c is
   select s.rowid upd_rowid
        , sum(s.quantity_sold) number_sold
        , count(distinct s.time_id) days_sold
        , min(s.time_id) first_sold_date
        , max(s.time_id) last_sold_date
     from sales s
        , products p
    where s.time_id > ’31-DEC-97and s.prod_id = p.prod_id
    group by s.rowid;
begin
  for r in c loop
    update sales
    set days_sold     = r.days_sold
    , number_sold     = r.number_sold
    , first_sold_date = r.first_sold_date
    , last_sold_date  = r.last_sold_date
    , last_updated    = v_date
    where rowid = r.upd_rowid;
  end loop;
end;
/

The bulk collect with a
limit of 10,000 is what accounted for the 92 fetches. The basic cursor for loop
in the second code listing resulted in an array size of 100. But doesn’t the
cursor for loop process just one record at a time? Yes and no. Yes, in that
what is coded looks that way, but no in that with PL/SQL in Oracle10g, we get a
“free” bulk collect using a limit of 100.

The last row in Table 1 is
from an update statement, and is the update step using the collection defined
in update_sales_fact. Note that in the update, there are no fetches. We’re not
fetching rows at this point, so the fetch count should be zero. Let’s compare
the difference between the updates based on the collection and the cursor for
loop.

Using the collection (repeated from Table 1)


call     count       cpu    elapsed       disk      query    current        rows
——- ——  ——– ———- ———- ———- ———-  ———-
Parse        1      0.00       0.00          0          0          0           0
Execute     92     48.67      77.19       6419    1798733    3604784      918843
Fetch        0      0.00       0.00          0          0          0           0
——- ——  ——– ———- ———- ———- ———-  ———-
total       93     48.67      77.19       6419    1798733    3604784      918843

Using the cursor for loop


call     count       cpu    elapsed       disk      query    current        rows
——- ——  ——– ———- ———- ———- ———-  ———-
Parse        1      0.00       0.00          0          0          0           0
Execute 918843     64.57      91.42       4776    1798879    3607290      918843
Fetch        0      0.00       0.00          0          0          0           0
——- ——  ——– ———- ———- ———- ———-  ———-
total   918844     64.57      91.42       4776    1798879    3607290      918843

With the collection, the
update statement was executed 92 times as opposed to one time for each and
every record used in the cursor (918,843 times).The elapsed time difference is
also significantly larger now. In other words, we’re seeing the cost of doing
business using less efficient code.

Let’s suppose you get the
dreaded call from a user (“My query is taking longer now, what’s wrong with the
database?”). Since you happened to have base lined this particular query, you
pull up the tracing and explain plan information for it (the query is not shown,
as it is the same one we’ve been using). We note that Oracle uses an index scan
on the PRODUCTS_PK index (based on the PRODUCTS table). The SALES table is
partitioned, and based on what the query is looking for, we’re going to hit
most if not all partitions.

call     count       cpu    elapsed       disk      query    current        rows
——- ——  ——– ———- ———- ———- ———-  ———-
Parse        1      0.06       0.06          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       92      8.42      13.53      15806      12741         11      918843
——- ——  ——– ———- ———- ———- ———-  ———-
total       94      8.48      13.60      15806      12741         11      918843
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 69  (SH)   (recursive depth: 1)
Rows     Execution Plan
——-  —————————————————
      0  SELECT STATEMENT   MODE: ALL_ROWS
      0   SORT (GROUP BY)
      0    HASH JOIN
      0     INDEX   MODE: ANALYZED (FULL SCAN) OF ‘PRODUCTS_PK’
             (INDEX (UNIQUE))
      0     PARTITION RANGE (ITERATOR) PARTITION:KEY STOP=28
      0      TABLE ACCESS   MODE: ANALYZED (FULL) OF ‘SALES’ (TABLE)
                 PARTITION:KEY STOP=28

This execution plan is what
we expect to see in the user’s session or by having the query explained or
traced right now. In the beginning of your investigation, what degree of detail
do you want to dive into? In other words, is using AUTOTRACE good enough or do
you want to trace the user’s session as he is running the query (or procedure
in this case)?

Using tracing, the formatted
output is:

call     count       cpu    elapsed       disk      query    current        rows
——- ——  ——– ———- ———- ———- ———-  ———-
Parse        1      0.06       0.06          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       92      7.67      15.92      15645      24160         11      918843
——- ——  ——– ———- ———- ———- ———-  ———-
total       94      7.73      15.99      15645      24160         11      918843
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 69  (SH)   (recursive depth: 1)
Rows     Execution Plan
——-  —————————————————
      0  SELECT STATEMENT   MODE: ALL_ROWS
      0   SORT (GROUP BY)
      0    HASH JOIN
      0     TABLE ACCESS   MODE: ANALYZED (FULL) OF ‘PRODUCTS’ (TABLE)
      0     PARTITION RANGE (ITERATOR) PARTITION:KEY STOP=28
      0      TABLE ACCESS   MODE: ANALYZED (FULL) OF ‘SALES’ (TABLE)
                 PARTITION:KEY STOP=28

By quick inspection, we can
see that the explain plan is different. For some reason now, we have a full
table scan of the PRODUCTS table. Since the index usage, or lack thereof, seems
to be the culprit, what is the status of the index? Quite a few methods can be
used to determine the status of the index; I’ll use Toad for this example.

Bring up the SH schema in
your sample database, and view indexes in the schema browser.

For whatever reason, the
index has been marked unusable. Rebuild it and the query’s explain plan will go
back to the baseline. While we’re at it, what else took place when the
procedure was executing? Let’s look at the wait events in the update phase. Any
number of homegrown freely available on the Internet scripts can be used to
view wait events. A representative list of wait events can also be observed in
Toad.

Tracing also captures this
information:

Rows     Execution Plan
——-  —————————————————
      0  UPDATE STATEMENT   MODE: ALL_ROWS
      0   UPDATE OF ‘SALES’
      0    TABLE ACCESS   MODE: ANALYZED (BY USER ROWID) OF ‘SALES’
               (TABLE) PARTITION:ROW LOCATION
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  —————————————-   Waited  ———-  ————
  db file sequential read                      7621        1.15         21.60
  rdbms ipc reply                               155        0.00          0.01
  log file switch completion                     19        0.99          5.77
  log buffer space                                9        0.08          0.70
  log file switch (checkpoint incomplete)        16        0.99         10.80
  control file sequential read                   19        0.01          0.05
  Data file init write                           16        0.03          0.19
  db file single write                            1        0.00          0.00
  control file parallel write                     3        0.00          0.00

We lost some time waiting on
log file operations. This can be confirmed by viewing the alert log:

Alert log extract showing the wait on log file
switch


Thread 1 advanced to log sequence 119
  Current log# 1 seq# 119 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO01.LOG
Thread 1 advanced to log sequence 120
  Current log# 3 seq# 120 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO03.LOG
Mon Dec 31 12:29:34 2007
Thread 1 cannot allocate new log, sequence 121
Checkpoint not complete
  Current log# 3 seq# 120 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO03.LOG
Thread 1 advanced to log sequence 121

Further, Enterprise Manager
(Database Control) has also highlighted this problem in a couple of places:

What was the log file switch
frequency like when this procedure was being run? Again, Toad provides a quick
means of visually inspecting the situation:

After drilling down into the
“Performance Finding Details” page, we see the recommendation to increase the
size of the log files. The recommendation about incremental shipping does not
apply in this case.

Okay, 633MB it is for the
redo logs. The seed database used 50MB sized log files. The update generates
10-11 log switches, so that’s 500 to 550MB of disk space. How often you run
this procedure will influence Oracle’s recommendation. The main point is this:
based on what takes place with the current code, the high switch rate can be
reduced by increasing the size of the redo logs. Less switches implies fewer
waits, and more than likely (the important part), less time spent waiting.

Gathering user information

Getting the user’s SID and
serial# was pulled from Toad, and tracing was done in a SQL*Plus session using
the DBMS_MONITOR built-in.

Starting and stopping tracing in another session


  SQL> EXECUTE DBMS_MONITOR.SESSION_TRACE_ENABLE(
session_id => 159, serial_num => 2376, waits => TRUE, binds => FALSE);
PL/SQL procedure successfully completed.
SQL> EXECUTE DBMS_MONITOR.SESSION_TRACE_DISABLE
(session_id => 159, serial_num => 2376);
PL/SQL procedure successfully completed.

At the OS level, identify
the trace file and run TKPROF against it. This is easy to perform and is
covered extensively elsewhere.

Running TKPROF on the trace file


  D:\oracle\product\10.2.0\admin\db10\udump>dir
 Volume in drive D has no label.
 Volume Serial Number is 046C-688A
 Directory of D:\oracle\product\10.2.0\admin\db10\udump
12/31/2007  11:34 AM    <DIR>          .
12/31/2007  11:34 AM    <DIR>          ..
12/31/2007  11:34 AM         1,865,565 db10_ora_1076.trc
               1 File(s)      1,865,565 bytes
               2 Dir(s)  48,742,436,864 bytes free
D:\oracle\product\10.2.0\admin\db10\udump>tkprof db10_ora_1076.trc my_trace.txt sys=no
TKPROF: Release 10.2.0.3.0 – Production on Mon Dec 31 11:38:28 2007
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
D:\oracle\product\10.2.0\admin\db10\udump>
Advertisement

In Closing

This was an obviously easy
tuning problem, but what it has in common with more complex problems is the
same set of tools and analysis behind it. We navigated through SQL*Plus, Toad,
Enterprise Manager, TKPROF, and a report out of Advisor Central. What were the
indicators of things gone awry or of what could be better?

The time spent on sequential
reads versus scattered reads – the times by themselves have to be taken into
context with what you expect to be taking place. If you’re expecting an index
to be used, but the scattered reads are high, or conversely, the sequential
reads seem to be low, this is one of the key wait events to consider. Were
direct path read and write waits expected (if you run these procedures, these
two events will appear in the select phase)? Again, context matters, because
partitions were involved.

Out of all the wait events
shown, the log file related ones were not expected. After increasing the log
file member sizes, the log file switch went away.

Fixing the redo log file size


  alter database
add logfile group 4
(‘D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO04.LOG’) size 633M;
alter database
add logfile group 5
(‘D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO05.LOG’) size 633M;
alter database
add logfile group 6
(‘D:\ORACLE\PRODUCT\10.2.0\ORADATA\DB10\REDO06.LOG’) size 633M;
–switch out the remaining current group, drop it when no
–longer needed for crash recovery
alter database drop logfile group 1;
alter database drop logfile group 2;
alter database drop logfile group 3;

Wait events in the update statement after
increasing the redo log size


  Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  —————————————-   Waited  ———-  ————
  db file sequential read                     11917        0.16         33.64
  log file switch completion                      3        0.27          0.33

So, had a user come to you
complaining about a procedure taking longer than normal, not only could you
have fixed the root problem (fix the index), but you also could have fixed
another bottleneck (log files being too small). Assuming you made the changes
to the redo log member size, were you justified in doing so? Again, context
matters. When was the procedure being run? Switching every 20 minutes or so
also depends on when during the day, that is, during normal business hours or
someone running this late at night when few other users are logged on.

Before you can sort out
which events are truly problematic, you need to be able to investigate them
with one or more tools. Being able to seamlessly move around and use various
tools should be second nature. You want to focus your efforts on identifying
problems, not on identifying why a tool isn’t working for you.

»


See All Articles by Columnist
Steve Callan

Steve Callan

Steve is an Oracle DBA (OCP 8i and 9i)/developer working in Denver. His Oracle experience also includes Forms and Reports, Oracle9iAS and Oracle9iDS.

Database Journal Logo

DatabaseJournal.com publishes relevant, up-to-date and pragmatic articles on the use of database hardware and management tools and serves as a forum for professional knowledge about proprietary, open source and cloud-based databases--foundational technology for all IT systems. We publish insightful articles about new products, best practices and trends; readers help each other out on various database questions and problems. Database management systems (DBMS) and database security processes are also key areas of focus at DatabaseJournal.com.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.