One of the basic facts or concepts that virtually all DBAs know is that truncating a table to remove data is faster than deleting the same data. But beyond that truism, there are subtle differences that when taken into account, can help improve the overall performance of a job or process. Truncation and deletion each have their advantages and disadvantages, so it can be useful to know when each is better than the other is. Like many other functions and features in Oracle, what you can choose to use depends upon what you want to accomplish. As a general principle, making things go faster (but not at the expense of other sessions) is always a good thing.
Truncate or Delete a Stage Table?
Stage or staging tables (as discussed in a prior article) are collectors or intake mechanisms to get data into a database. The lifecycle of a staging table can fall into two categories: fixed and dynamic. By fixed, I mean that the table name stays the same and a data load process always uses the same table. From one week to the next (or day to day, for that matter), contents of files are uploaded into the fixed stage table, the table is cleaned out, and the next set of files gets loaded, and so on. With dynamically named stage tables, the purpose of the table stays the same, but the name changes. This suggests partitioning and potentially, pruning. Your application may reference three weeks of staged data, or something longer, so more than likely, the name of the partitioned staging area is related to a date and the insert statement includes the partition key. Because the partition key limits when data can be inserted here, cleaning out, that is, deleting or truncating this type of stage table, is more of an ad hoc operation.
So, let’s focus on the static table. We know we can always reload a table if something goes awry during the load. Whether the source is from Web logs that have been parsed using Perl or converted into pipe delimited flat files loaded via SQL*Loader (or referenced as an external table), cleaning up after an error and starting over is relatively simple. Hopefully, that is the exception and not the rule for you. Of interest here, is the rule or norm. The stage table needs to be cleaned out perhaps thousands of times per day. Is there anything we can do to reduce the time spent cleaning out the table?
As an experiment, let’s compare a delete, truncate, and truncate with reuse of storage. The my_objects table is a copy of all_objects, and we’ll go through ten iterations of populating and clearing the table.
set serveroutput on declare v_trunc varchar2(40) := 'truncate table my_objects'; v_reuse varchar2(40) := v_trunc || ' reuse storage'; v_start number := 0; v_end number := 0; begin --delete v_start := dbms_utility.get_time; for i in 1..10 loop delete from my_objects; insert into my_objects select * from all_objects; end loop; dbms_output.put_line ( 'Straight delete is '|| round
((dbms_utility.get_time-v_start)/100, 2)|| ' seconds...' ); --truncate only v_start := dbms_utility.get_time; for i in 1..10 loop execute immediate v_trunc; insert into my_objects select * from all_objects; end loop; dbms_output.put_line ( 'Straight trunc is '|| round
((dbms_utility.get_time-v_start)/100, 2)|| ' seconds...' ); --starting reuse storage v_start := dbms_utility.get_time; for i in 1..10 loop execute immediate v_reuse; insert into my_objects select * from all_objects; end loop; dbms_output.put_line ( 'Reuse storage is '|| round
((dbms_utility.get_time-v_start)/100, 2)|| ' seconds...' ); end; / Straight delete is 63.15 seconds... Straight trunc is 32.86 seconds... Reuse storage is 30.76 seconds...
The exact times aren’t important, but the differences are. As stated, it is well known that truncate is much faster than delete, but how much faster is it? That depends on whether or not you reuse storage, or put another way, it depends on how you manage objects. These examples do not include indexes either, so the time Oracle spends managing space here is something else to consider.
Managing Schema Objects
Chapter 13 of the Administrator’s Guide (10gR2) is titled “Managing Schema Objects.” Most performance issues aren’t related to managing objects, but how well you manage them. A delete causes redo and undo, and may involve indexes and triggers. In other words, as the documentation says, there’s a lot going on. With truncate, drop storage is the default for extent allocation/reclamation. No undo, among other “no” steps makes this fast. However, given that we’re going to be un-allocating space thousands of times over, what is the cost of doing that? From a performance perspective, the cost is measured in time.
Assuming the two seconds or so in the example above scale, we’re now talking about shaving off 400 seconds across 2000 operations. In another experiment using 100,000 rows in the SH schema’s SALES table, and running through 100 iterations, the time difference was more than 30 seconds, or 600 seconds if scaled to 2000 iterations. Now we’re up to a savings of ten minutes. When truncating a table with the default of drop storage, or building up a table while allocating extents, turn on tracing beforehand and then examine the TKPROF’d trace file. Oracle is churning through lots of data dictionary related statements to not only get rid of the data, but to do the internal housecleaning required to mark the space as being eligible for use (or having been used) elsewhere.
For an even more visual impact, select the extents and blocks from user_segments while the loops are running. As space is being allocated and released, you’ll see the number of extents increase and decrease. When the reuse storage loop is in focus, the number of extents will remain constant. Or, if using Toad, refresh the stats display in the schema browser, (selecting whatever user is doing this and the target table name).
You can also drop and recreate the table. It would be intuitively obvious that this approach wouldn’t begin to compare with either storage choice when using the truncate command. This approach also raises object validity issues. If the table is not present (because it was dropped or not yet created), and your stage table populate/clean out steps are in code, then your package/procedure/function will be created with a compilation error. Create the table later and Oracle will compile the code at runtime, but why get into something this messy in the first place? From an engineering standpoint, let’s suppose you’re an application provider. Do you want to have to document “ignore the following error while installing product X” or would it be better to have a clean installation?
Insert /*+ APPEND */
What if the stage table uses the APPEND hint? First of all, do you understand what that hint does for you? No need to guess as the documentation clearly states what the hint does.
The APPEND hint instructs the optimizer to use direct-path INSERT if your database is running in serial mode. Your database is in serial mode if you are not using Enterprise Edition. Conventional INSERT is the default in serial mode, and direct-path INSERT is the default in parallel mode.
In direct-path INSERT, data is appended to the end of the table, rather than using existing space currently allocated to the table. As a result, direct-path INSERT can be considerably faster than conventional INSERT.
Actually, there is some room to guess. If you are using Enterprise Edition, then what mode is being used? If not Enterprise Edition implies serial mode, then Enterprise Edition implies what? The answer is parallel, but non-parallel operations can still be specified.
What happens to the amount of storage if you’re using the APPEND hint in the insert statement for a static stage table? Let’s do some inserts using APPEND and REUSE STORAGE. The starting point with an empty truncated drop storage table is a single 64K extent. After ten inserts with reuse storage, the table has 10 extents (same code from before). What happens to the space if the APPEND hint is used? At the end of the inserts, the table still has 10 extents. What if APPEND were used with DELETE (and operating in parallel)?
SQL> set serveroutput on SQL> declare 2 v_trunc varchar2(40) := 'truncate table my_objects'; 3 v_reuse varchar2(40) := v_trunc || ' reuse storage'; 4 v_start number := 0; 5 v_end number := 0; 6 begin 7 8 --delete/append 9 v_start := dbms_utility.get_time; 10 for i in 1..10 loop 11 delete from my_objects; 12 insert /*+ append */ 13 into my_objects 14 select * from all_objects; 15 end loop; 16 dbms_output.put_line 17 ( 'Delete/append is '|| round((dbms_utility.get_time-v_start)/100, 2)|| 18 ' seconds...' ); 19 commit; 20 end; 21 / declare * ERROR at line 1: ORA-12838: cannot read/modify an object after modifying it in parallel ORA-06512: at line 11
In the delete/append scenario, we need a commit to bypass this self-inflicted error. Running again with a COMMIT after the INSERT statement drove the time up to just over 109 seconds along with the added bonus of having 21 extents allocated, with virtually the first 20 sitting there with empty – but have to be scanned – blocks, so other operations such as COUNT(*) are now working harder/longer.
SQL> select count(*) from my_objects; COUNT(*) ---------- 5969 Elapsed: 00:00:00.04 SQL> truncate table my_objects; Table truncated. Elapsed: 00:00:07.48 SQL> select count(*) from my_objects; COUNT(*) ---------- 0 Elapsed: 00:00:00.01 SQL> insert into my_objects select * from all_objects; 5969 rows created. Elapsed: 00:00:04.18 SQL> select count(*) from my_objects; COUNT(*) ---------- 5969 Elapsed: 00:00:00.01
Anyone can manage objects, but not everyone can manage them well. Demonstrations like the ones shown here help drive home the point of the need to understand not only what your applications do, but also how Oracle interacts with them. Just because something runs in the middle of the night and no users really care (at that time) if a job takes three hours or three minutes is not justification to support or enable less than optimal (or just plain good) practices.
Designing and coding well in the beginning has tangible benefits down the road. As an application continues to grow in scope and complexity, the time that little inefficiencies here and there start to eat up can become significant. The stories about being able to tune code such that what once took hours now takes mere minutes to complete are true. It’s nice to see the benefits of having improved a slow running process, but it’s better still to not have been in that situation in the first place.