DataPump Components
DataPump consists of three components that work in concert
to manage and perform operations:
Command-Line Interface. Like the original Export (EXP.EXE) and
Import (IMP.EXE)
command-line interfaces, the DataPump provides two command-line interfaces,
EXPDP.EXE and
IMPDP.EXE,
for controlling DataPump Export and Import operations, respectively.
DataPump also expands the command-line interface's
capabilities by providing Interactive-Command Mode. This mode allows the
DBA to start a DataPump operation and then disconnect from it by simply issuing
a CTL+C keystroke. Later, the DBA simply opens another DataPump session while
specifying the operation's job name and then issues the ATTACH directive to
reconnect to the session. The DBA can then issue a series of commands, from
taking a pulse of the operation via the STATUS directive, to adding more export
dump files, or even terminating a long-running operation if desired.
DBMS_METADATA. Introduced in Oracle 9i, this PL/SQL
supplied package provides methods to extract and format metadata -
literally, "information about information" - from an Oracle database. At its
core, DBMS_METADATA
stores metadata in XML format for ultimate flexibility in its presentation of
that metadata.
DataPump Export uses DBMS_METADATA to gather all metadata about the
database objects being processed then stores that metadata in the Export
operation's master table. DataPump Import uses the corresponding master table
to extract information from the export dump file(s), and then uses DBMS_METADATA
to create the DDL statements it needs to create new database objects as part of
the Import operation.
DBMS_DATAPUMP. At the heart of the DataPump is the
new DBMS_DATAPUMP
PL/SQL supplied package. This package contains methods for exporting data from
and importing data into any Oracle database. EXPDP and IMPDP are actually
making calls to this new package to perform their respective operations.
What is especially powerful about DBMS_DATAPUMP is that a DBA can also utilize it
directly, either in anonymous PL/SQL blocks or within a stored procedure,
stored function, or package, to create custom DataPump Export and Import jobs. I
will demonstrate how to accomplish this in the final article in this series
when we look at some of the other advanced features of the DataPump.
DataPump Jobs, Master Processes, and the Master Table
DataPump also implements major (and in my opinion, elegant!)
improvements to the execution and management of Export and Import operations.
Each DataPump operation uses a master process to manage and control the
export or import session. Each master process is identified by a unique job
name, and DataPump automatically assigns one by default if one is not
specified at execution time.
The master process controls the DataPump operation by
invoking at least one worker process that actually performs the Export
or Import. (This is extremely similar to the way Recovery Manager (RMAN)
manages backup, restoration and recovery processing.) If the DataPump operation
has requested additional parallel processes for multiple-threaded processing,
more than one worker process will be created and managed.
The master process also creates a master table as a
user table within the user account in which the DataPump job is invoked. This
master table is always named the same as the DataPump job, and it is used slightly
differently depending on the type of DataPump operation, but it is always
created within the same user account that is invoking the operation.
During an Export operation, the master table is
created when the job commences, contains all target objects for the Export, and
is retained until the job finishes successfully, at which point it is dropped.
However, if the job is paused or fails due to error, the Export process uses
the master table to determine what objects still need to be exported and how
best to resume the job. Once the job completes successfully, the master table's
contents are written to the Export file itself for use by an Import process.
On the other hand, an Import operation reads the
master table from the Export dump file and then creates the master table under
the appropriate user account. During an Import, the master process uses the
master table to decide which database objects need to be created and in what
order before any data is actually imported.
A Simple DataPump Export Scenario
To demonstrate an example of how easy it is to use DataPump
for exporting data, I will create a dump file containing all database objects
stored in the Human Resources (HR) schema for use in a later DataPump Import
operation. (The next article will greatly expand upon the capabilities of
DataPump Export, I assure you!)
Listing
1.1 shows how to create a DIRECTORY database object for storage of all
DataPump dump files. DataPump also uses the DIRECTORY object as a default
location for log files and parameter files.
Listing
1.2 provides a set of queries to display listings of what object types
can be processed with the DataPump, and whether a filter can be applied against
the object type at the database, schema, or table level.
Listing
1.3 shows the DataPump Export command issued, its corresponding
parameter file, and the log file that documents the results of the DataPump
Export operation.
A Simple DataPump Import Scenario
Using DataPump to import data is equally simple. To
illustrate, I have constructed the following scenario:
-
The DataPump Export dump file created in the previous scenario
will be used to import data into a new schema, HR_OLTP, that will eventually become the target of a
new online transaction processing application under development.
-
The new schema needs all of the tables from the existing HR
schema and those tables' related objects, i.e. referential constraints,
indexes, sequences, and triggers.
-
No other database objects should be loaded to this new schema
from the existing HR schema.
See Listing
1.4 for the SQL to create the new schema, the DataPump Import command
issued, its corresponding parameter file, and the log file that documents the
results of the DataPump Import operation.
Monitoring DataPump Operations
Oracle 10g provides two new views, DBA_DATAPUMP_JOBS and DBA_DATAPUMP_SESSIONS that allow the DBA to monitor the
progress of all DataPump operations. Listing 1.5
shows two queries that return valuable information about ongoing DataPump jobs
and sessions.
Conclusion
Oracle 10g's new DataPump offers significant improvements
over the original command-line-based Export and Import utilities, including
multi-threaded processing, improved object filtering capabilities, and restartability.
The next article in this series will dive deeply into utilizing the DataPump
effectively to accomplish some typical (and not so typical!) "real-world"
export and import tasks.
References and Additional Reading
While there is no substitute for direct experience, reading
the manual is not a bad idea, either. I have drawn upon the following Oracle
10g documentation for the deeper technical details of this article:
B10750-01 Oracle Database
New Features Guide
B10825-01 Oracle Database
Utilities
»
See All Articles by Columnist Jim Czuprynski