Oracle 10g: Exploring Data Pump
Oracle 10g offers several new features, one of which is Data Pump technology for fast data movement
between databases. Most Oracle shops still use their traditional export and
import utility scripts rather this new technology. Data Pump technology is
entirely different from the export/import utility, although they have a similar
look and feel. Data Pump runs inside the database as a job, which means jobs
are somewhat independent of the process that started the import or export. Another
advantage is that other DBAs can login to the database and check the status of
the job. The advantages of Data Pump, along with Oracle's plan to deprecate the traditional
import/export utilities down the road, make Data Pump a worthwhile topic for discussion.
Oracle claims Data Pump
offers a transfer of data and metadata at twice the speed of export and twenty
to thirty times the speed of the import utility that DBAs have been using for
years. Data Pump manages this speed with multiple parallel streams of data to
achieve maximum throughput. Please note that Data Pump does not work with
utilities older than the 10g release 1 utility.
Data Pump consists of two
components: the Data Pump export utility called expdp, to Export the objects
from a database, and the Data Pump Import utility called impdp, to load the
objects into database. Just like traditional export and import utilities, the DBA
can control these jobs with several parameters.
$expdp username/password (other parameters here)
$impdp username/password (other parameters here)
We can get a quick summary
of all parameters and commands by simply issuing
Similar to the export and
import utility, Data Pump export and import utilities are extremely useful for migrating
especially large databases from an operating system and importing them into a
database running on a different platform and operating system in a short amount
The Oracle supplied package,
DBMS_DATAPUMP, can be used to implement the API, through which you can access the
Data Pump export and import utilities programmatically. In other words, we can
create a much powerful, custom Data Pump utility using Data Pump technology, if
you have hundreds of databases to manage.
One of the interesting points
is how Data Pump initiates the export session. In the traditional export
utility, the user process writes the exported data to the disk that is
requested from the server process, as a part of regular session. The Data Pump
expdp - user process launches a server-side process or job that writes data to
disks on the server node, and this process runs independently of the session
established by expdp client. However, similar to the traditional export
utility, Data Pump writes the data into dump files in an Oracle proprietary
format that only the Data Pump import utility can understand.
New Features of Data Dump that improve the performance of Data
Below are some of the
features that differentiate the traditional export and import utility from Data
Pump. These features not only enhance the speed of the data transfer but also
are handy for the DBA to asses how the job would run before actually running
Init.ora parameters that affect the performance of Data Pump:
Oracle recommends the
following settings to improve performance.
Additionally, the number
of processes and sessions allowed to the database must be set to high, to allow
for maximum parallelism.
How Data Pump accesses loading and unloading of Data:
Oracle has provided direct
path to unload or export operations since Oracle 7.3. This method has been very
useful for DBAs that want a quick export of the database and this process has
been further enhanced in the Data Pump technology. Oracle uses the direct path
method for loading (impdp) and unloading (expdp) when the structure of the
tables allows it. If the table is part of a cluster, or it has a global
index on a partitioned table, then Data Pump accesses the data in a
different method called External Table. Both the
direct path load and external table method support the same external data
representation, so we can load the data that was unloaded with External Table
method and vice versa.
As stated earlier,
Data Pump is a server-based utility, rather than client-based; dump files, log
files, and SQL files are accessed relative to server-based directory paths.
Data Pump requires you to specify directory paths as directory
objects. A directory object maps a name to a directory path on the file system.
1. The following SQL
statements creates a user, a directory object named dpump_dir1 and grants the
permissions to the user.
SQLPLUS system/manager@TDB10G as sysdba
SQL> create user dpuser identified by dpuser;
SQL> grant connect, resource to dpuser;
SQL> CREATE DIRECTORY dpump_dir1 AS '/opt/app/oracle';
SQL> grant read, write on directory dpump_dir1 to dpuser
2. Let us see
EXCLUDE parameters can be used to limit
the load and unload of particular objects. When the
INCLUDE parameter is used, only the
objects specified by it will be included in the export. When the
EXCLUDE parameter is used, all objects
except those specified by it will be included in the export: Assume we have EMP,EMP_DETAILS
and DEPT tables owned by dpuser.
$ expdp dpuser/dpuser@TDB10G schemas=dpuser include= TABLE:\"IN (\'EMP\', \'DEPT\')\"
directory=dpump_dir1 dumpfile=dpuser.dmp log=dpuser.log
$expdp dpuser/dpuser@TDB10G schemas=dpuser exclude=TABLE:\"= \'EMP_DETAILS\'\"
directory=dpump_dir1 dumpfile=dpuser2.dmp logfile=dpuser.log
earlier, Data pump performance can be significantly improved by using the
PARALLEL parameter. This should be used
in conjunction with the "%U" wildcard in the
DUMPFILE parameter to allow multiple
dumpfiles to be created or read:
$expdp dpuser/dpuser@TDB10G schemas=dpuser
directory=dpump_dir1 parallel=4 dumpfile=dpuser_%U.dmp logfile=dpuser.log
Data Pump API:
The Data Pump API,
DBMS_DATAPUMP, provides a high-speed mechanism to move the data from one
database to another. Infact, the Data Pump Export and Data Pump Import
utilities are based on the Data Pump API. The structure used in the client
interface of this API is a job handle. Job handle can be created using the OPEN
or ATTACH function of the DBMS_DATAPUMP package. Other DBA sessions can attach
to a job to monitor and control its progress so that remote DBA can monitor the
job that was scheduled by an on-site DBA.
The following steps list the
basic activities involved in using Data Pump API.
procedure to create job.
Define parameters for the job
like adding file and filters etc.
Start the job.
Optionally monitor the job until
Optionally detach from job and
attach at later time.
Optionally, stop the job
Restart the job that was
Example of the above steps:
P_handle number; --- -- Data Pump job handle
P_last_job_state varchar2(45); ---- -- To keep track of job state
P_status ku$_Status ----- -- The status object returned by get_status
P_handle:=DBMS_DATAPUMP.OPEN ('EXPORT','SCHEMA', NULL,'EXAMPLE','LATEST');
-- Specify a single dump file for the job (using the handle just returned)
-- and a directory object, which must already be defined and accessible
-- to the user running this procedure
-- A metadata filter is used to specify the schema that will be exported.
DBMS_DATAPUMP.METADATA_FILTER (p_handle,'SCHEMA_EXPR','IN (''dpuser'')');
-- Start the job. An exception will be generated if something is not set up
----The export job should now be running.
The status of the
job can be checked by writing a separate procedure and capturing the errors and
status until it is completed. Overall job status can also be obtained by
querying SELECT * from dba_datapump_jobs.
Data Pump is a great tool for the fast movement of data between the databases
and much of this performance improvement is derived from the use of parameter parallelism.
Even when the Transportable
Tablespace feature of Oracle is used to move self-contained data between the
databases, Data Pump is still required for handling the extraction and
recreation of the metadata for that tablespace. Whenever possible, Data Pump performance is further
maximized by using Direct-Path driver. Otherwise, Data Pump accesses the data
using an External Table access driver.Data Pump provides flexibility, with the implementation of
parameters such as INCLUDE, EXCLUDE, QUERY, and TRANSFORM that gives the DBA
more control of data and objects being loaded and unloaded. With all of these
features, Data Pump is a welcome addition to DBA tools in a world that
constantly redefines the size of the large database.
See All Articles by Columnist Sreeram Surapaneni