Oracle 10g DataPump, Part 2: Implementation

Synopsis. Oracle 10g’s new DataPump is designed as
the eventual replacement for the Oracle Import and Export utilities. This
article – the second and final in this series – focuses on using the DataPump’s
features in real-world scenarios, including creation of custom DataPump jobs
with DBMS_DATAPUMP.

The previous
article
in this series identified some of the deficiencies of the original
Export and Import utilities, and then provided a broad overview of Oracle 10g’s
new DataPump utility. This article will concentrate on how to effectively
utilize the DataPump for common import and export tasks, including how to
create custom DataPump scheduled tasks that can be scheduled via the Oracle 10g
Scheduler. I will start with some real-world scenarios that show how Oracle 10g’s
DataPump Export utility provides plenty of flexibility in handling typical
export requirements.

DataPump Export Scenarios

Creating a Nightly Disaster Recovery Export Scheme.
As part of my personal list of Oracle disaster recovery "best practices,"
I have made a habit of building a nightly disaster recovery export scheme
for a new Oracle database shortly after the database’s creation. In addition,
whenever I create a new schema for an existing database, I make sure to add
that schema to the existing nightly export scheme. At a minimum, I insure that
at least the contents of all tables are exported regularly.

Setting up a nightly export scheme via the DataPump Export
utility is relatively simple. Listing 2.1
shows the DataPump parameter file used to generate the export files. Since my
target database for this example is relatively small, I am using the FULL
parameter to generate an export that contains the entire contents of the
database including all metadata, all database objects, and of course the data
itself. I am also using the STATUS
parameter to tell the DataPump to write the status of the ongoing operation to
the log file at regular intervals – in this case, every 30 seconds. Log 2.1 shows the resulting log file
generated during this DataPump Export operation.

Estimating an Export’s Size. Export operations like
the nightly disaster recovery export scheme I have just described can obviously
use quite a bit of storage, especially for a large database. Thankfully, DataPump
offers improved mechanisms for estimating an export’s size. The ESTIMATE_ONLY
parameter accepts a value of either BLOCKS (the default), which performs an estimate
based on the number of physical blocks that will be accessed, or STATISTICS, which
performs an estimate based on stored database statistics (providing, of course,
that statistics have been generated for the objects that are being exported).

Listing 2.2 shows
the DataPump parameter file that generates the probable sizes of the export
files for the nightly disaster recovery export. Note that I have set ESTIMATE_ONLY
to a value of STATISTICS,
and that the DUMPFILE
directive has been removed (and in fact will cause an error if included during
an estimates-only operation). See Log
2.2
for the resulting log file generated during this DataPump Export
operation.

Multiple Server Processes and Dump Files During Exports.
Another common request I receive regularly involves producing a subset of a
production database to refresh the same tables in a development database. For
example, recently one of my developers asked me to export the data from just
one schema from our production database so she could import that data into
another database to use it for application regression testing. In addition, the
developer wanted to save the data on alternate media – in this case, standard
650MB CD-ROMs – for archival purposes.

The new DataPump Export makes quick work of this type of
request. To demonstrate, I will configure the FILESIZE directive to limit the file size to only
650 MB, and I will provide for multiple export dump files to be written
by specifying the %U variable for the DUMPFILE parameter. By attaching %U to the file
name, DataPump will automatically increment a number in the range of 01 to 99
each time that a new file is created.

To speed up this operation, I will specify the use of multiple
server processes
by setting the PARALLEL parameter to 2 to enable two DataPump
worker processes to distribute the load. Oracle recommends that there
are at least as many DUMPFILEs available as the number of PARALLEL processes
specified.

Listing 2.3 shows
the DataPump parameter file that generates these export files, and Log 2.3 contains the resulting log file
generated during this operation.

Exporting Selected Tables. As part of our regular
development cycle, I need to refresh data held in several tables for a database
used by our HR department for training Payroll application users. This database
contains a mere fraction of the total transactional data in our production
databases – just enough to provide examples that match our current training
class documents – but any outstanding payroll transactions need to be applied
to this "baseline" information at the start of each training class.

To make quick work of this task, I will use DataPump Export
to dump out just the data from the required tables by specifying them as values
for the TABLES
parameter. Listing 2.4 shows the
corresponding DataPump parameter file to perform this operation, and Log 2.4 displays the resulting log
file generated.

Exporting Different Versions of a Database Object.
Occasionally a developer needs to see how data in a table was stored before an
operation took place, or which version of a particular database object was in
force at a particular date and time. DataPump Export can export an object as of
a particular system control number (SCN) specified for the FLASHBACK_SCN
parameter, or as of a particular time specified for the FLASHBACK_TIME
parameter. (Note that these options are mutually exclusive.)

Listing 2.5
shows an example of a DataPump Export parameter file that would export all
objects in the HR schema while reflecting the state of those objects as of
April 10, 2005 just before midnight.

Jim Czuprynski
Jim Czuprynski
Jim Czuprynski has accumulated over 30 years of experience during his information technology career. He has filled diverse roles at several Fortune 1000 companies in those three decades - mainframe programmer, applications developer, business analyst, and project manager - before becoming an Oracle database administrator in 2001. He currently holds OCP certification for Oracle 9i, 10g and 11g. Jim teaches the core Oracle University database administration courses on behalf of Oracle and its Education Partners throughout the United States and Canada, instructing several hundred Oracle DBAs since 2005. He was selected as Oracle Education Partner Instructor of the Year in 2009. Jim resides in Bartlett, Illinois, USA with his wife Ruth, whose career as a project manager and software quality assurance manager for a multinational insurance company makes for interesting marital discussions. He enjoys cross-country skiing, biking, bird watching, and writing about his life experiences in the field of information technology.

Latest Articles