One of the basic facts or
concepts that virtually all DBAs know is that truncating a table to remove data
is faster than deleting the same data. But beyond that truism, there are subtle
differences that when taken into account, can help improve the overall
performance of a job or process. Truncation and deletion each have their
advantages and disadvantages, so it can be useful to know when each is better
than the other is. Like many other functions and features in Oracle, what you
can choose to use depends upon what you want to accomplish. As a general
principle, making things go faster (but not at the expense of other sessions)
is always a good thing.
Truncate or Delete a Stage Table?
Stage or staging tables (as
discussed in a prior
article) are collectors or intake mechanisms to get data into a database.
The lifecycle of a staging table can fall into two categories: fixed and dynamic.
By fixed, I mean that the table name stays the same and a data load process
always uses the same table. From one week to the next (or day to day, for that
matter), contents of files are uploaded into the fixed stage table, the table
is cleaned out, and the next set of files gets loaded, and so on. With
dynamically named stage tables, the purpose of the table stays the same, but
the name changes. This suggests partitioning and potentially, pruning. Your
application may reference three weeks of staged data, or something longer, so
more than likely, the name of the partitioned staging area is related to a date
and the insert statement includes the partition key. Because the partition key
limits when data can be inserted here, cleaning out, that is, deleting or
truncating this type of stage table, is more of an ad hoc operation.
So, lets focus on the
static table. We know we can always reload a table if something goes awry
during the load. Whether the source is from Web logs that have been parsed
using Perl or converted into pipe delimited flat files loaded via SQL*Loader
(or referenced as an external table), cleaning up after an error and starting
over is relatively simple. Hopefully, that is the exception and not the rule
for you. Of interest here, is the rule or norm. The stage table needs to be
cleaned out perhaps thousands of times per day. Is there anything we can do to
reduce the time spent cleaning out the table?
As an experiment, lets
compare a delete, truncate, and truncate with reuse of storage. The my_objects
table is a copy of all_objects, and well go through ten iterations of
populating and clearing the table.
set serveroutput on
declare
v_trunc varchar2(40) := 'truncate table my_objects';
v_reuse varchar2(40) := v_trunc || ' reuse storage';
v_start number := 0;
v_end number := 0;
begin
--delete
v_start := dbms_utility.get_time;
for i in 1..10 loop
delete from my_objects;
insert into my_objects
select * from all_objects;
end loop;
dbms_output.put_line
( 'Straight delete is '|| round
((dbms_utility.get_time-v_start)/100, 2)||
' seconds...' );
--truncate only
v_start := dbms_utility.get_time;
for i in 1..10 loop
execute immediate v_trunc;
insert into my_objects
select * from all_objects;
end loop;
dbms_output.put_line
( 'Straight trunc is '|| round
((dbms_utility.get_time-v_start)/100, 2)||
' seconds...' );
--starting reuse storage
v_start := dbms_utility.get_time;
for i in 1..10 loop
execute immediate v_reuse;
insert into my_objects
select * from all_objects;
end loop;
dbms_output.put_line
( 'Reuse storage is '|| round
((dbms_utility.get_time-v_start)/100, 2)||
' seconds...' );
end;
/
Straight delete is 63.15 seconds...
Straight trunc is 32.86 seconds...
Reuse storage is 30.76 seconds...
The exact times arent
important, but the differences are. As stated, it is well known that truncate
is much faster than delete, but how much faster is it? That depends on whether
or not you reuse storage, or put another way, it depends on how you manage
objects. These examples do not include indexes either, so the time Oracle
spends managing space here is something else to consider.
Managing Schema Objects
Chapter 13 of the
Administrators Guide (10gR2) is titled Managing
Schema Objects. Most performance issues arent related to managing
objects, but how well you manage them. A delete causes redo and undo, and
may involve indexes and triggers. In other words, as the documentation says, theres
a lot going on. With truncate, drop storage is the default for extent
allocation/reclamation. No undo, among other no steps makes this fast. However,
given that were going to be un-allocating space thousands of times over, what
is the cost of doing that? From a performance perspective, the cost is measured
in time.
Assuming the two seconds or
so in the example above scale, were now talking about shaving off 400 seconds
across 2000 operations. In another experiment using 100,000 rows in the SH
schemas SALES table, and running through 100 iterations, the time difference
was more than 30 seconds, or 600 seconds if scaled to 2000 iterations. Now were
up to a savings of ten minutes. When truncating a table with the default of
drop storage, or building up a table while allocating extents, turn on tracing
beforehand and then examine the TKPROFd trace file. Oracle is churning through
lots of data dictionary related statements to not only get rid of the data, but
to do the internal housecleaning required to mark the space as being eligible
for use (or having been used) elsewhere.
For an even more visual
impact, select the extents and blocks from user_segments while the loops are
running. As space is being allocated and released, youll see the number of
extents increase and decrease. When the reuse storage loop is in focus, the
number of extents will remain constant. Or, if using Toad, refresh the stats
display in the schema browser, (selecting whatever user is doing this and the
target table name).
Another Option
You can also drop and
recreate the table. It would be intuitively obvious that this approach wouldnt
begin to compare with either storage choice when using the truncate command.
This approach also raises object validity issues. If the table is not present
(because it was dropped or not yet created), and your stage table
populate/clean out steps are in code, then your package/procedure/function will
be created with a compilation error. Create the table later and Oracle will
compile the code at runtime, but why get into something this messy in the first
place? From an engineering standpoint, lets suppose youre an application
provider. Do you want to have to document ignore the following error while
installing product X or would it be better to have a clean installation?
Insert /*+ APPEND */
What if the stage table uses
the APPEND hint? First of all, do you understand what that hint does for you?
No need to guess as the documentation
clearly states what the hint does.
The APPEND hint instructs the optimizer to use direct-path INSERT if your
database is running in serial mode. Your database is in serial mode if you are
not using Enterprise Edition. Conventional INSERT is the default
in serial mode, and direct-path INSERT is the default in parallel mode.
In direct-path INSERT, data is appended to the end of the
table, rather than using existing space currently allocated to the table. As a
result, direct-path INSERT
can be considerably faster than conventional INSERT.
Actually, there is some room
to guess. If you are using Enterprise Edition, then what mode is being used? If
not Enterprise Edition implies serial mode, then Enterprise Edition
implies what? The answer is parallel, but non-parallel operations can still be
specified.
What happens to the amount
of storage if youre using the APPEND hint in the insert statement for a static
stage table? Lets do some inserts using APPEND and REUSE STORAGE. The starting
point with an empty truncated drop storage table is a single 64K extent. After
ten inserts with reuse storage, the table has 10 extents (same code from
before). What happens to the space if the APPEND hint is used? At the end of
the inserts, the table still has 10 extents. What if APPEND were used with
DELETE (and operating in parallel)?
SQL> set serveroutput on
SQL> declare
2 v_trunc varchar2(40) := 'truncate table my_objects';
3 v_reuse varchar2(40) := v_trunc || ' reuse storage';
4 v_start number := 0;
5 v_end number := 0;
6 begin
7
8 --delete/append
9 v_start := dbms_utility.get_time;
10 for i in 1..10 loop
11 delete from my_objects;
12 insert /*+ append */
13 into my_objects
14 select * from all_objects;
15 end loop;
16 dbms_output.put_line
17 ( 'Delete/append is '|| round((dbms_utility.get_time-v_start)/100, 2)||
18 ' seconds...' );
19 commit;
20 end;
21 /
declare
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel
ORA-06512: at line 11
In the delete/append
scenario, we need a commit to bypass this self-inflicted error. Running again
with a COMMIT after the INSERT statement drove the time up to just over 109
seconds along with the added bonus of having 21 extents allocated, with
virtually the first 20 sitting there with empty but have to be scanned
blocks, so other operations such as COUNT(*) are now working harder/longer.
SQL> select count(*) from my_objects;
COUNT(*)
----------
5969
Elapsed: 00:00:00.04
SQL> truncate table my_objects;
Table truncated.
Elapsed: 00:00:07.48
SQL> select count(*) from my_objects;
COUNT(*)
----------
0
Elapsed: 00:00:00.01
SQL> insert into my_objects select * from all_objects;
5969 rows created.
Elapsed: 00:00:04.18
SQL> select count(*) from my_objects;
COUNT(*)
----------
5969
Elapsed: 00:00:00.01
In Closing
Anyone can manage objects,
but not everyone can manage them well. Demonstrations like the ones shown here
help drive home the point of the need to understand not only what your
applications do, but also how Oracle interacts with them. Just because
something runs in the middle of the night and no users really care (at that
time) if a job takes three hours or three minutes is not justification to
support or enable less than optimal (or just plain good) practices.
Designing and coding well in
the beginning has tangible benefits down the road. As an application continues
to grow in scope and complexity, the time that little inefficiencies here and
there start to eat up can become significant. The stories about being able to
tune code such that what once took hours now takes mere minutes to complete are
true. Its nice to see the benefits of having improved a slow running process,
but its better still to not have been in that situation in the first place.
»
See All Articles by Columnist Steve Callan