Archiving older data is a complex task; local, national and sometimes international regulations dictate when, how and for how long the archived data must remain available. Add to that the seemingly insurmountable task of storing all of this data electronically and what appears, from those outside the IT arena, to be a simple act can end up as anything but simple. Within the context of an Oracle database there are methods of archiving data, some simple, some a bit more complex but still within the realm of possibility. Let’s look at those options and what they can, and cannot, offer.
The first option that comes to mind (mine, anyway) involves partitioning, an Enterprise Edition option (which should not be a surprise since companies who generate reams of data to archive usually install this edition). Archiving in this scenario is fairly easy: convert the relevant partitions to stand-alone tables in their own tablespace, separate from the ‘live’ production data. If this data is now on its own storage it can even be moved to another database server to facilitate access and not impact daily production. Let’s look at the steps involved with this option. First let’s create a partitioned table:
CREATE TABLE archive_test
(
dusty DATE,
vol VARCHAR2(60),
info NUMBER
)
PARTITION BY RANGE ( dusty )
(
PARTITION really_old
VALUES LESS THAN ( TO_DATE('01-apr-1999','dd-mon-yyyy'))
TABLESPACE older_than_dirt,
PARTITION quite_old
VALUES LESS THAN ( TO_DATE('01-jul-2004','dd-mon-yyyy'))
TABLESPACE old_as_dirt,
PARTITION sorta_new
VALUES LESS THAN ( TO_DATE('01-oct-2009','dd-mon-yyyy'))
TABLESPACE newer,
PARTITION really_new
VALUES LESS THAN ( TO_DATE('01-jan-2012','dd-mon-yyyy'))
TABLESPACE newest
);
--
-- Create local prefixed index
--
CREATE INDEX i_archives_l ON archive_test ( dusty,vol )
LOCAL (
PARTITION i_otd_one TABLESPACE i_otd_one,
PARTITION i_oad_two TABLESPACE i_oad_two,
PARTITION i_nwr_three TABLESPACE i_nwr_three,
PARTITION i_nwst_four TABLESPACE i_nwst_four
);
The last partition of our table is set to accept all data through 01/01/2012 so archiving data simply involves converting the desired partition to a stand-alone table, preferably stored on a different diskgroup or array than the current production data. For the sake of illustration let’s put the destination tablespace, ARCHIVED_TS, in a separate ASM diskgroup (doing this allows for the movement of the diskgroup to another physical server for use by a separate Oracle instance). To archive the partition REALLY_OLD to a stand-alone table named REALLY_OLD_TBL:
--
-- Create empty table matching partition definition
--
create table really_old_tbl
(
dusty date,
vol varchar2(60),
info number
) tablespace archived_ts; -- Tablespace created in separate ASM diskgroup or on separate storage
--
-- Check row count in desired partition
--
select count(*)
from archive_test partition(really_old);
--
-- Move partition data to stand-alone table
--
alter table archive_test
exchange partition really_old with table really_old_tbl with validation;
--
-- Verify all rows written to destination table
--
select count(*)
from really_old_tbl;
--
-- Drop now-empty partition presuming row counts match
--
alter table archive_test drop partition really_old;
The data is now archived to a separate table and will no longer be available in ARCHIVE_TEST; this, however, makes partition QUITE_OLD the first partition resulting in any DUSTY value less than the upper partition limit being stored there, including values that should have been in REALLY_OLD. This may not be an issue as values that old may no longer be generated but it is an aspect to consider when archiving older data from a partitioned table.
A second method is available for those not using partitioning, which involves creating an archive table from the source table by selecting the desired data (this will also work for partitioned tables and may be the option of choice if a single archive table is desired as the above illustrated method creates a new table for each partition to be archived):
--
-- Create table and copy data
--
create table really_old_tbl
tablespace archived_ts
as select *
from archive_test
where dusty <= [some date here];
--
-- Verify all data copied successfully
--
select *
from archive_test
where (dusty,vol,info) not in (select * from really_old_tbl)
and dusty <= [some date here];
--
-- Delete from source table
--
delete
from archive_test
where dusty <= [some date here];
commit;
The data is now archived to a separate table. Changing the create table statement to an insert statement can allow for ‘newer’ archived data to be stored in the same archive table; again a similar condition exists as any data within the archived range can still be inserted into the source table as no date limits may exist to restrict inserts. A trigger can be used to restrict such inserts as shown below:
SQL> create or replace trigger ins_chk_trg
2 before insert on archive_test
3 for each row
4 declare
5 mindt date;
6 begin
7 select max(dusty) into mindt from really_old_tbl;
8 if :new.dusty <= mindt then
9 raise_application_error(-20987,'Data ('||:new.dusty||') outside of acceptable date range', true);
10 end if;
11 end;
12 /
Trigger created.
SQL>
SQL> insert into archive_test
2 values (to_date('&mindt', 'RRRR-MM-DD HH24:MI:SS') - 1, 'Test value x', -1,to_date('&mindt', 'RRRR-MM-DD HH24:MI:SS') );
old 2: values (to_date('&mindt', 'RRRR-MM-DD HH24:MI:SS') - 1, 'Test value x', -1,to_date('&mindt', 'RRRR-MM-DD HH24:MI:SS') )
new 2: values (to_date('1999-03-08 14:17:40', 'RRRR-MM-DD HH24:MI:SS') - 1, 'Test value x', -1,to_date('1999-03-08 14:17:40', 'RRRR-MM-DD HH24:MI:SS') )
insert into archive_test
*
ERROR at line 1:
ORA-20987: Data (07-MAR-99) outside of acceptable date range
ORA-06512: at "BLORPO.INS_CHK_TRG", line 6
ORA-04088: error during execution of trigger 'BLORPO.INS_CHK_TRG'
SQL>
Such a trigger can be used on partitioned and non-partitioned tables to police the inserts and reject those bearing dates present in the archive table. As the archive table data increases (due to subsequent inserts) the trigger will recognize the new maximum and use it to reject inserts.
Lest we forget the external utilities both exp/imp and expdp/impdp can be used to archive data; the QUERY option to both exp and expdp allows extraction of specific data from a given table so that only the oldest data will be exported. Oracle recommends using a parameter file when using the QUERY option to avoid operating system specific escape characters. Additionally, expdp allows for one query per table and multiple table:query pairs when specified with the schema.table:query format. A sample parameter file is shown below:
TABLES=employees, departments
QUERY=employees:'"WHERE department_id > 10 AND salary > 10000"'
QUERY=departments:'"WHERE department_id > 10"'
NOLOGFILE=y
DIRECTORY=dpump_dir1
DUMPFILE=exp1.dmp
This creates tables with the source names and a limited subset of the source data, which can be imported into a different schema or different database. The imported tables can be renamed with the usual command (in releases 10g and later):
rename employees to employees_arch;
rename departments to departments_arch;
If you’re using a release older than 10g the process is a bit more time consuming, involving creating a new table with the desired name from the imported table then copying any index/constraint definitions to the new table, finally dropping the imported table once you’re certain the ‘renamed’ table has all necessary indexes and constraints in place.
Archiving older data is not a terribly difficult task (at least in an Oracle database) but it does take planning and attention to detail to ensure all of the desired data is properly archived and available for the end users. Maintaining the archived table (or tables) also takes planning as applications may need to be written to directly access the archive and, in the case of multiple archive tables, be ‘smart’ enough to be able to access the newer additions as they arrive. Remember, too, that the specifications for the archiving revolve around local, state, federal (in the U.S.) and possibly international regulations and the archiving scheme must be flexible enough to provide the required ‘window’ of access. It’s also true that archived data may outlast the regulations that established it (unless legal issues preclude maintaining the archive beyond the prescribed date range); in such cases a sound storage strategy is a must and it’s not unusual for archived data to go from Tier II (slower, cheaper disk) storage to Tier I (tape) as long as the data is still accessible as access speed is not a requirement for archived data.