Synopsis. Oracle 10g’s new DataPump utility is
designed as the eventual replacement for the original Oracle Import and Export
utilities. This article – the first in a series – provides an overview of the DataPump’s
suite of tools for extracting, transforming, and loading data within an Oracle
database.
Oracle’s Import and Export utilities offer a great way to
create, retain, and reload a point-in-time snapshot of a database’s structure
and contents at the table, schema, tablespace, or database level. Database
exports also have a place as a potential third line of defense in an in-depth
disaster recovery plan (please view my article on Becoming
A Master of Disaster for a more detailed discussion of this concept).
I have also used these utilities on a regular basis to
migrate data from our production environment to multiple development
environments, but as reliable as these utilities are, they do have some
drawbacks that needed to be addressed:
Operations are difficult to restart. If an export
session died because it ran out of disk space, or if the session was terminated
deliberately or accidentally, the only option is to restart the entire export
from the beginning, or to create a new export control file at the point of
failure and initiate the export again.
Execution is client-side and single-threaded. The
Import and Export utilities use a client process to execute their
respective operations. This means that the most powerful source of information
– the database server itself – is unavailable to help control the Import or
Export operation. Also, there is no way to utilize parallel processing on a
database server while exporting data, so each Export operation is essentially
one single, serial, potentially long-running task.
Dump files can grow enormously. The Export utility
creates one or more export dump files during each session; during a complete
dump of a database, these files can obviously grow quite large. Since it is
difficult to estimate the approximate expected size of the completed export
files, the operation may fail due to disk space constraints. Also, to import
just one table or database object from a multi-schema export file, the Import
utility has to read all dump files to locate the object before it can be
restored – obviously not an optimal situation!
Tuning mechanisms are limited. Methods to tune a poorly
executing Export or Import operation are relatively limited and somewhat
arcane. Experimenting with different settings for the BUFFER parameter — which
controls the maximum number of rows fetched into the Export utility’s buffer at
one time – is one of the few recommended options for improving performance.
Progress monitoring is difficult. While a DBA can
certainly monitor the command session that is running the operation, or review
the spooled output from an operation, there is no simple way to tell exactly
how efficiently an Export or Import operation is performing, and there is no
simple way to tell exactly what percentage of a large operation is completed.
Database object filtering features are limited. While
it is certainly possible to export and import selected individual tables from a
database or schema, filtering features are relatively limited. For example, it
is impossible to export all tables with common names across schemas
within the same export session. It is also impossible to select specific
database object types other than tables (like packages, stored procedures,
or triggers) for inclusion in an export dump file.
Enter The DataPump: Features Overview
Fortunately, Oracle 10g has addressed many of these issues
with the DataPump, a new set of utilities that significantly expands the
capabilities of the original Export and Import utilities.
Server-side parallel execution. DataPump Import and
Export operations are scheduled and processed within the database server
itself as a DataPump job – a much more effective method than
client-side execution. In addition, if the database server’s OS and hardware
can adequately support it, DataPump export operations can take advantage of parallel
processes to write out multiple data streams simultaneously.
Improved control and restartability of operations.
Since the database server controls the DataPump, a DBA can start an Export or
Import operation and then detach from it, allowing the operation to run to
completion without further intervention. The DBA can also reconnect to the
operation from another command-line session, interrogate its current status and
choose to pause the operation if necessary. And if it appears that the
operation is dangerously close to failing – for example, because of insufficient
disk space for an export file – the DBA can even add more disk files “on the
fly” without canceling the existing operation. This provides a level of
control at least an order of magnitude more powerful than the current
Export/Import utilities.
Simplified monitoring of operational status. DataPump
provides a new STATUS
parameter that shows the status of an ongoing Export or Import operation. When STATUS is specified at operation startup, Oracle
will report the operation’s status to the terminal session of the command-line
interface at the specified interval. The DBA can also reattach to a detached
DataPump job, issue a STATUS
request, determine how far the operation has progressed, and then detach from
the job again.
Automated performance tuning. Unlike the traditional
Export and Import utilities, DataPump implements self-tuning mechanisms
to insure that tables are exported and imported in the most efficient manner.
Gone are the BUFFER, COMMIT, COMPRESS,
CONSISTENT, DIRECT, and RECORDLENGTH
parameters.
Improved database object filtering. DataPump
implements filtering of the database objects to capture during an Export
operation or include as part of an Import operation with a new set of mutually-exclusive
parameters, INCLUDE and EXCLUDE.
These parameters allow the DBA to specify almost any database object for
inclusion or exclusion based on its object type and object name.
This dramatically improves flexibility when exporting and importing; for
example, it is now possible to gather objects with similar names across
an entire database.
Export dump file control enhancements. In prior releases,
the DBA could specify more than one dump file as a destination for an Export
session; however, once the session commenced, no additional files could be
specified. DataPump Export improves upon this by permitting parameterized
file names similar to the naming conventions for redo logs and RMAN backup
files.
DataPump Concepts
The DataPump is the next logical progression in a series of
improvements that have appeared during the last few Oracle releases. The two
most dramatic upgrades involve the ability to read data extremely
quickly via direct path unloading, and the capability to write
data directly to external tables.
Direct Path Unloads. Prior releases of Oracle had
introduced the concept of the direct path load, which permits the
server to bypass SQL parsing of INSERT statements so that data can be inserted
directly and with blinding speed into database tables, providing the tables met
certain conditions. Building on this concept,
Oracle 10g adds the capability to perform direct path unloads of
database tables within the same limits that apply for direct path loads.
Writing To External Tables. Oracle 9i Release 2
introduced the ability to read data from files stored outside the
database as external tables using existing Oracle SQL*Loader access
methods. Oracle 10g expands upon this feature by including a new access
method, ORACLE_DATAPUMP,
that allows the server to write directly to an external file.
DataPump Export uses these two enhancements to unload data
extremely quickly. Even better, DataPump needs virtually no assistance in
deciding which method is fastest for exporting data, and is relatively
self-tuning. For example, the concept of direct vs. conventional exports is now
meaningless, and several of the tuning parameters (e.g. BUFFER, RECORDLENGTH)
have been deprecated.
The DataPump engine will choose to write to external tables
over direct path unload whenever a table meets any of the following criteria:
- The table is part of a cluster.
- There is an active trigger on the table.
- Fine-grained access control is enabled for the table for INSERTs.
-
One of the table’s columns has a referential integrity
constraint. -
The table has an LOB column, and that LOB column has a domain
index. -
The table contains a BFILE column or VARRAY column, and
those column(s) have an embedded, opaque TYPE.