Oracle 10g: Exploring Data Pump
October 12, 2006Oracle 10g: Exploring Data Pump
Oracle 10g offers several new features, one of which is Data Pump technology for fast data movement between databases. Most Oracle shops still use their traditional export and import utility scripts rather this new technology. Data Pump technology is entirely different from the export/import utility, although they have a similar look and feel. Data Pump runs inside the database as a job, which means jobs are somewhat independent of the process that started the import or export. Another advantage is that other DBAs can login to the database and check the status of the job. The advantages of Data Pump, along with Oracle's plan to deprecate the traditional import/export utilities down the road, make Data Pump a worthwhile topic for discussion.
Oracle claims Data Pump offers a transfer of data and metadata at twice the speed of export and twenty to thirty times the speed of the import utility that DBAs have been using for years. Data Pump manages this speed with multiple parallel streams of data to achieve maximum throughput. Please note that Data Pump does not work with utilities older than the 10g release 1 utility.
Data Pump consists of two components: the Data Pump export utility called expdp, to Export the objects from a database, and the Data Pump Import utility called impdp, to load the objects into database. Just like traditional export and import utilities, the DBA can control these jobs with several parameters.
$expdp username/password (other parameters here) $impdp username/password (other parameters here)
We can get a quick summary of all parameters and commands by simply issuing
$expdp help=y $impdp help=y
Similar to the export and import utility, Data Pump export and import utilities are extremely useful for migrating especially large databases from an operating system and importing them into a database running on a different platform and operating system in a short amount of time.
The Oracle supplied package, DBMS_DATAPUMP, can be used to implement the API, through which you can access the Data Pump export and import utilities programmatically. In other words, we can create a much powerful, custom Data Pump utility using Data Pump technology, if you have hundreds of databases to manage.
One of the interesting points is how Data Pump initiates the export session. In the traditional export utility, the user process writes the exported data to the disk that is requested from the server process, as a part of regular session. The Data Pump expdp - user process launches a server-side process or job that writes data to disks on the server node, and this process runs independently of the session established by expdp client. However, similar to the traditional export utility, Data Pump writes the data into dump files in an Oracle proprietary format that only the Data Pump import utility can understand.
New Features of Data Dump that improve the performance of Data movement:
Below are some of the features that differentiate the traditional export and import utility from Data Pump. These features not only enhance the speed of the data transfer but also are handy for the DBA to asses how the job would run before actually running Data Dump.
Init.ora parameters that affect the performance of Data Pump:
Oracle recommends the following settings to improve performance.
Additionally, the number of processes and sessions allowed to the database must be set to high, to allow for maximum parallelism.
How Data Pump accesses loading and unloading of Data:
Oracle has provided direct path to unload or export operations since Oracle 7.3. This method has been very useful for DBAs that want a quick export of the database and this process has been further enhanced in the Data Pump technology. Oracle uses the direct path method for loading (impdp) and unloading (expdp) when the structure of the tables allows it. If the table is part of a cluster, or it has a global index on a partitioned table, then Data Pump accesses the data in a different method called External Table. Both the direct path load and external table method support the same external data representation, so we can load the data that was unloaded with External Table method and vice versa.
As stated earlier, Data Pump is a server-based utility, rather than client-based; dump files, log files, and SQL files are accessed relative to server-based directory paths. Data Pump requires you to specify directory paths as directory objects. A directory object maps a name to a directory path on the file system.
1. The following SQL statements creates a user, a directory object named dpump_dir1 and grants the permissions to the user.
SQLPLUS system/manager@TDB10G as sysdba SQL> create user dpuser identified by dpuser; SQL> grant connect, resource to dpuser; SQL> CREATE DIRECTORY dpump_dir1 AS '/opt/app/oracle'; SQL> grant read, write on directory dpump_dir1 to dpuser
2. Let us see
$ expdp dpuser/dpuser@TDB10G schemas=dpuser include= TABLE:\"IN (\'EMP\', \'DEPT\')\" directory=dpump_dir1 dumpfile=dpuser.dmp log=dpuser.log $expdp dpuser/dpuser@TDB10G schemas=dpuser exclude=TABLE:\"= \'EMP_DETAILS\'\" directory=dpump_dir1 dumpfile=dpuser2.dmp logfile=dpuser.log
earlier, Data pump performance can be significantly improved by using the
$expdp dpuser/dpuser@TDB10G schemas=dpuser directory=dpump_dir1 parallel=4 dumpfile=dpuser_%U.dmp logfile=dpuser.log
Data Pump API:
The Data Pump API, DBMS_DATAPUMP, provides a high-speed mechanism to move the data from one database to another. Infact, the Data Pump Export and Data Pump Import utilities are based on the Data Pump API. The structure used in the client interface of this API is a job handle. Job handle can be created using the OPEN or ATTACH function of the DBMS_DATAPUMP package. Other DBA sessions can attach to a job to monitor and control its progress so that remote DBA can monitor the job that was scheduled by an on-site DBA.
The following steps list the basic activities involved in using Data Pump API.
1. Execute DBMS_DATAPUMP.OPEN procedure to create job.
2. Define parameters for the job like adding file and filters etc.
3. Start the job.
4. Optionally monitor the job until it completes.
5. Optionally detach from job and attach at later time.
6. Optionally, stop the job
7. Restart the job that was stopped.
Example of the above steps:
Declare P_handle number; --- -- Data Pump job handle P_last_job_state varchar2(45); ---- -- To keep track of job state P_job_state varchar2(45); P_status ku$_Status ----- -- The status object returned by get_status BEGIN P_handle:=DBMS_DATAPUMP.OPEN ('EXPORT','SCHEMA', NULL,'EXAMPLE','LATEST'); -- Specify a single dump file for the job (using the handle just returned) -- and a directory object, which must already be defined and accessible -- to the user running this procedure DBMS_DATAPUMP.ADD_FILE (p_handle,'example.dmp','DMPDIR'); -- A metadata filter is used to specify the schema that will be exported. DBMS_DATAPUMP.METADATA_FILTER (p_handle,'SCHEMA_EXPR','IN (''dpuser'')'); -- Start the job. An exception will be generated if something is not set up -- Properly. DBMS_DATAPUMP.start_job (p_handle); ----The export job should now be running.
The status of the job can be checked by writing a separate procedure and capturing the errors and status until it is completed. Overall job status can also be obtained by querying SELECT * from dba_datapump_jobs.
Oracle Data Pump is a great tool for the fast movement of data between the databases and much of this performance improvement is derived from the use of parameter parallelism. Even when the Transportable Tablespace feature of Oracle is used to move self-contained data between the databases, Data Pump is still required for handling the extraction and recreation of the metadata for that tablespace. Whenever possible, Data Pump performance is further maximized by using Direct-Path driver. Otherwise, Data Pump accesses the data using an External Table access driver.Data Pump provides flexibility, with the implementation of parameters such as INCLUDE, EXCLUDE, QUERY, and TRANSFORM that gives the DBA more control of data and objects being loaded and unloaded. With all of these features, Data Pump is a welcome addition to DBA tools in a world that constantly redefines the size of the large database.