Oracle 10g: Exploring Data Pump
Oracle 10g offers several new features, one of which is Data Pump technology for fast data movement
between databases. Most Oracle shops still use their traditional export and
import utility scripts rather this new technology. Data Pump technology is
entirely different from the export/import utility, although they have a similar
look and feel. Data Pump runs inside the database as a job, which means jobs
are somewhat independent of the process that started the import or export. Another
advantage is that other DBAs can login to the database and check the status of
the job. The advantages of Data Pump, along with Oracle’s plan to deprecate the traditional
import/export utilities down the road, make Data Pump a worthwhile topic for discussion.
Oracle claims Data Pump
offers a transfer of data and metadata at twice the speed of export and twenty
to thirty times the speed of the import utility that DBAs have been using for
years. Data Pump manages this speed with multiple parallel streams of data to
achieve maximum throughput. Please note that Data Pump does not work with
utilities older than the 10g release 1 utility.
Data Pump consists of two
components: the Data Pump export utility called “expdp,” to Export the objects
from a database, and the Data Pump Import utility called “impdp,” to load the
objects into database. Just like traditional export and import utilities, the DBA
can control these jobs with several parameters.
$expdp username/password (other parameters here) $impdp username/password (other parameters here)
We can get a quick summary
of all parameters and commands by simply issuing
$expdp help=y $impdp help=y
Similar to the export and
import utility, Data Pump export and import utilities are extremely useful for migrating
especially large databases from an operating system and importing them into a
database running on a different platform and operating system in a short amount
The Oracle supplied package,
DBMS_DATAPUMP, can be used to implement the API, through which you can access the
Data Pump export and import utilities programmatically. In other words, we can
create a much powerful, custom Data Pump utility using Data Pump technology, if
you have hundreds of databases to manage.
One of the interesting points
is how Data Pump initiates the export session. In the traditional export
utility, the user process writes the exported data to the disk that is
requested from the server process, as a part of regular session. The Data Pump
expdp – user process launches a server-side process or job that writes data to
disks on the server node, and this process runs independently of the session
established by expdp client. However, similar to the traditional export
utility, Data Pump writes the data into dump files in an Oracle proprietary
format that only the Data Pump import utility can understand.
New Features of Data Dump that improve the performance of Data
Below are some of the
features that differentiate the traditional export and import utility from Data
Pump. These features not only enhance the speed of the data transfer but also
are handy for the DBA to asses how the job would run before actually running
Parallel Threads: The parallel parameter specifies the
maximum number of threads of active execution operating on behalf of the export
job. This execution set consists of a combination of worker processes and
parallel I/O server processes. Because each active worker processes or I/O
server process works on one active file at a time, the DBA must specify a sufficient
number of files. Therefore, the value the DBA specifies for this parameter should
be less than or equal to the number of files in the dump file set.
This important parameter helps the DBA to make a trade-off
between resource consummation and the elapsed time.
restart the job: The
ability to restart a job is an extremely useful feature if DBA is involved in
moving large amounts of data. The Data Pump job can be restarted without any
data loss or corruption after unexpected failure or if the DBA stopped the job
with stop_job parameter.
detach from and reattach the job: This allows other DBAs to monitor jobs from multiple locations.
We can attach the Data Pump export and import utilities to one job at a time
but we can have multiple clients attached to the same job.
export and import operations over the network: The NETWORK_LINK parameter initiates an export using
a database link. It means that the system, to which expdp is connected, contacts
the source database referenced by the source_database_link, retrieves data from
it and writes the data to a dump file set back on the connected system.
Ability to change the name of
source datafile to a different name: The
DBA can change the name of the source datafile to a different name in all
DDL statements where the source datafile is referenced.
Support for filtering the
metadata: The DBA can filter
metadata using the “EXCLUDE” and “INCLUDE” options. If the object is excluded,
all of its dependent objects are also excluded. For example, EXCLUDE=CONSTRAINT
will exclude all constraints except “NOT NULL” and constraints needed for table
creation, which includes: INCLUDE=TABLE:"IN(‘EMPLOYEES’,’DEPARTMENTS’)"
Estimate: The DBA
can estimate how much space an export job will consume, without actually
performing the export.
Parameter: The DBA can filter data during the export
by specifying a clause for a “SELECT” statement.
Parameter: The DBA can
specify what is exported or imported, for example, Meta data only or data only
Init.ora parameters that affect the performance of Data Pump:
Oracle recommends the
following settings to improve performance.
Additionally, the number
of processes and sessions allowed to the database must be set to high, to allow
for maximum parallelism.
How Data Pump accesses loading and unloading of Data:
Oracle has provided direct
path to unload or export operations since Oracle 7.3. This method has been very
useful for DBAs that want a quick export of the database and this process has
been further enhanced in the Data Pump technology. Oracle uses the direct path
method for loading (impdp) and unloading (expdp) when the structure of the
tables allows it. If the table is part of a cluster, or it has a global
index on a partitioned table, then Data Pump accesses the data in a
different method called External Table. Both the
direct path load and external table method support the same external data
representation, so we can load the data that was unloaded with External Table
method and vice versa.
As stated earlier,
Data Pump is a server-based utility, rather than client-based; dump files, log
files, and SQL files are accessed relative to server-based directory paths.
Data Pump requires you to specify directory paths as directory
objects. A directory object maps a name to a directory path on the file system.
1. The following SQL
statements creates a user, a directory object named dpump_dir1 and grants the
permissions to the user.
SQLPLUS system/manager@TDB10G as sysdba
SQL> create user dpuser identified by dpuser;
SQL> grant connect, resource to dpuser;
SQL> CREATE DIRECTORY dpump_dir1 AS ‘/opt/app/oracle’;
SQL> grant read, write on directory dpump_dir1 to dpuser
2. Let us see
EXCLUDE parameters can be used to limit
the load and unload of particular objects. When the
INCLUDE parameter is used, only the
objects specified by it will be included in the export. When the
EXCLUDE parameter is used, all objects
except those specified by it will be included in the export: Assume we have EMP,EMP_DETAILS
and DEPT tables owned by dpuser.
$ expdp dpuser/dpuser@TDB10G schemas=dpuser include= TABLE:"IN ('EMP', 'DEPT')" directory=dpump_dir1 dumpfile=dpuser.dmp log=dpuser.log $expdp dpuser/dpuser@TDB10G schemas=dpuser exclude=TABLE:"= 'EMP_DETAILS'" directory=dpump_dir1 dumpfile=dpuser2.dmp logfile=dpuser.log
earlier, Data pump performance can be significantly improved by using the
PARALLEL parameter. This should be used
in conjunction with the "%U" wildcard in the
DUMPFILE parameter to allow multiple
dumpfiles to be created or read:
$expdp dpuser/dpuser@TDB10G schemas=dpuser directory=dpump_dir1 parallel=4 dumpfile=dpuser_%U.dmp logfile=dpuser.log
Data Pump API:
The Data Pump API,
DBMS_DATAPUMP, provides a high-speed mechanism to move the data from one
database to another. Infact, the Data Pump Export and Data Pump Import
utilities are based on the Data Pump API. The structure used in the client
interface of this API is a job handle. Job handle can be created using the OPEN
or ATTACH function of the DBMS_DATAPUMP package. Other DBA sessions can attach
to a job to monitor and control its progress so that remote DBA can monitor the
job that was scheduled by an on-site DBA.
The following steps list the
basic activities involved in using Data Pump API.
procedure to create job.
Define parameters for the job
like adding file and filters etc.
Start the job.
Optionally monitor the job until
Optionally detach from job and
attach at later time.
Optionally, stop the job
Restart the job that was
Example of the above steps:
P_handle number; — — Data Pump job handle
P_last_job_state varchar2(45); —- — To keep track of job state
P_status ku$_Status —– — The status object returned by get_status
P_handle:=DBMS_DATAPUMP.OPEN (‘EXPORT’,’SCHEMA’, NULL,’EXAMPLE’,’LATEST’);
— Specify a single dump file for the job (using the handle just returned)
— and a directory object, which must already be defined and accessible
— to the user running this procedure
— A metadata filter is used to specify the schema that will be exported.
DBMS_DATAPUMP.METADATA_FILTER (p_handle,’SCHEMA_EXPR’,’IN (”dpuser”)’);
— Start the job. An exception will be generated if something is not set up
—-The export job should now be running.
The status of the
job can be checked by writing a separate procedure and capturing the errors and
status until it is completed. Overall job status can also be obtained by
querying “SELECT * from dba_datapump_jobs”.
Data Pump is a great tool for the fast movement of data between the databases
and much of this performance improvement is derived from the use of parameter “parallelism.”
Even when the Transportable
Tablespace feature of Oracle is used to move self-contained data between the
databases, Data Pump is still required for handling the extraction and
recreation of the metadata for that tablespace. Whenever possible, Data Pump performance is further
maximized by using Direct-Path driver. Otherwise, Data Pump accesses the data
using an External Table access driver.Data Pump provides flexibility, with the implementation of
parameters such as INCLUDE, EXCLUDE, QUERY, and TRANSFORM that gives the DBA
more control of data and objects being loaded and unloaded. With all of these
features, Data Pump is a welcome addition to DBA tools in a world that
constantly redefines the size of the “large database”.