Oracle 10g DataPump, Part 2: Implementation
April 28, 2005
Synopsis. Oracle 10g's new DataPump is designed as the eventual replacement for the Oracle Import and Export utilities. This article - the second and final in this series - focuses on using the DataPump's features in real-world scenarios, including creation of custom DataPump jobs with DBMS_DATAPUMP.
The previous article in this series identified some of the deficiencies of the original Export and Import utilities, and then provided a broad overview of Oracle 10g's new DataPump utility. This article will concentrate on how to effectively utilize the DataPump for common import and export tasks, including how to create custom DataPump scheduled tasks that can be scheduled via the Oracle 10g Scheduler. I will start with some real-world scenarios that show how Oracle 10g's DataPump Export utility provides plenty of flexibility in handling typical export requirements.
DataPump Export Scenarios
Creating a Nightly Disaster Recovery Export Scheme. As part of my personal list of Oracle disaster recovery "best practices," I have made a habit of building a nightly disaster recovery export scheme for a new Oracle database shortly after the database's creation. In addition, whenever I create a new schema for an existing database, I make sure to add that schema to the existing nightly export scheme. At a minimum, I insure that at least the contents of all tables are exported regularly.
Setting up a nightly export scheme via the DataPump Export utility is relatively simple. Listing 2.1 shows the DataPump parameter file used to generate the export files. Since my target database for this example is relatively small, I am using the FULL parameter to generate an export that contains the entire contents of the database including all metadata, all database objects, and of course the data itself. I am also using the STATUS parameter to tell the DataPump to write the status of the ongoing operation to the log file at regular intervals - in this case, every 30 seconds. Log 2.1 shows the resulting log file generated during this DataPump Export operation.
Estimating an Export's Size. Export operations like the nightly disaster recovery export scheme I have just described can obviously use quite a bit of storage, especially for a large database. Thankfully, DataPump offers improved mechanisms for estimating an export's size. The ESTIMATE_ONLY parameter accepts a value of either BLOCKS (the default), which performs an estimate based on the number of physical blocks that will be accessed, or STATISTICS, which performs an estimate based on stored database statistics (providing, of course, that statistics have been generated for the objects that are being exported).
Listing 2.2 shows the DataPump parameter file that generates the probable sizes of the export files for the nightly disaster recovery export. Note that I have set ESTIMATE_ONLY to a value of STATISTICS, and that the DUMPFILE directive has been removed (and in fact will cause an error if included during an estimates-only operation). See Log 2.2 for the resulting log file generated during this DataPump Export operation.
Multiple Server Processes and Dump Files During Exports. Another common request I receive regularly involves producing a subset of a production database to refresh the same tables in a development database. For example, recently one of my developers asked me to export the data from just one schema from our production database so she could import that data into another database to use it for application regression testing. In addition, the developer wanted to save the data on alternate media - in this case, standard 650MB CD-ROMs - for archival purposes.
The new DataPump Export makes quick work of this type of request. To demonstrate, I will configure the FILESIZE directive to limit the file size to only 650 MB, and I will provide for multiple export dump files to be written by specifying the %U variable for the DUMPFILE parameter. By attaching %U to the file name, DataPump will automatically increment a number in the range of 01 to 99 each time that a new file is created.
To speed up this operation, I will specify the use of multiple server processes by setting the PARALLEL parameter to 2 to enable two DataPump worker processes to distribute the load. Oracle recommends that there are at least as many DUMPFILEs available as the number of PARALLEL processes specified.
Exporting Selected Tables. As part of our regular development cycle, I need to refresh data held in several tables for a database used by our HR department for training Payroll application users. This database contains a mere fraction of the total transactional data in our production databases - just enough to provide examples that match our current training class documents - but any outstanding payroll transactions need to be applied to this "baseline" information at the start of each training class.
To make quick work of this task, I will use DataPump Export to dump out just the data from the required tables by specifying them as values for the TABLES parameter. Listing 2.4 shows the corresponding DataPump parameter file to perform this operation, and Log 2.4 displays the resulting log file generated.
Exporting Different Versions of a Database Object. Occasionally a developer needs to see how data in a table was stored before an operation took place, or which version of a particular database object was in force at a particular date and time. DataPump Export can export an object as of a particular system control number (SCN) specified for the FLASHBACK_SCN parameter, or as of a particular time specified for the FLASHBACK_TIME parameter. (Note that these options are mutually exclusive.)
Listing 2.5 shows an example of a DataPump Export parameter file that would export all objects in the HR schema while reflecting the state of those objects as of April 10, 2005 just before midnight.