Creating Dump Files in Oracle GoldenGate
February 28, 2011
In addition to performing replication, GoldenGate can also be used to dump the contents of a table into a formatted file.
Getting Oracle GoldenGate up and running in a simple configuration for unidirectional replication is fairly easy to accomplish. The simplicity of this, compared to what it takes in Oracle Streams, will make you appreciate how much easier it can be to use GoldenGate.
In addition to performing replication, GoldenGate can also be used to dump the contents of a table into a formatted file. The format can be of several types, including a simple "whatever-delimited" format, and you can even output contents into XML. So, before describing how to use the "FORMATxxx" parameter ("xxx" being either ASCII, SQL or XML), let's take a quick tour of the normal output file from a GoldenGate process.
GoldenGate operates using several processes. At the top level, there is the Manager process. As an analogy, you can think of the Manager process as being similar to the SGA/PGA in Oracle. Overall, it is the administrative framework for GoldenGate. No Manager process = no replication/not open for business. There are two other processes which you, as the replication user, control and configure. These processes are Extract and Replicat. From a Streams perspective, Extract is Capture, and Replicat is Apply.
To further refine what Extract is, you can categorize an Extract process as being either primary or secondary. A primary Extract is the worker process that gets the data of interest. The data (or more generally, the SQL) can come from one of four sources. The two main sources relative to a primary Extract process are going to be from a table or from the transaction log. "Transaction log" is the general identifier of where GoldenGate captures the SQL to be applied later on your target database. In Oracle, "transaction log" translates to the redo logs, although you can extract from the archived redo logs if that is your preference.
A secondary Extract process is referred to as a Data Pump. This is not the Oracle Data Pump utility; it unfortunately happens to share the same name. A secondary Extract process, i.e., a Data Pump, extracts data from the output generated by the primary Extract. The Data Pump then sends applied redo across the network to a Replicat process, which is where the "apply" takes place.
You do not have to use a Data Pump to send data to a Replicat, but as a best practice related to availability, Oracle recommends using a secondary Extract.
In the diagram below, there are two different, but related, workflows taking place. The INITIAL LOAD (on top) portion represents the actions you take to get the target database synchronized with the source.
In the bottom half (CHANGE SYNCHRONIZATION), you can see "Trail or file." The primary Extract generates trail files. A Trail is a GoldenGate specific file. The contents of a Trail can be viewed using the Logdump utility. Running a normal or simple Extract creates one or more Trails, which are then used by the Replicat process on the other end to apply SQL.
Coming back to the FORMATxxx topic, you can execute a special run type of Extract. This type of run does not run continuously. You run it, and when it is finished, that's it.
How does GoldenGate know the difference between a continuously running Extract process and a special run Extract? The SPECIALRUN keyword (as one option to do this) in an Extract's definition (with respect to the Manager process) is what distinguishes the two. Another way is to leave the "when" part of the definition blank.
Every process group (manager, extract and replicat) has a parameter file. Extract and Replicat groups require an "ADD" command at the command line, The ADD EXTRACT command can be broken down into the following parts:
ADD EXTRACT <group name>, <data source options>, <service options>
To add an Extract where the data source is from the database (as opposed to from the redo logs), the command is:
ADD EXTRACT EDMP01, SOURCEISTABLE
Note there is nothing (above) for the <service options>.
The name of the group process is EDMP01, where "E" is for extract, "DMP" is for dump, and "01" is a simple identifier of 01, 02, etc. The DMP part connotes the idea of dump, which is exactly what we’re trying to do: dump the contents of a table into a file.
The next component is the parameter file for the dump. An example is shown below.
The line numbers do not exist in an actual parameter file – they’re only shown here for clarity.
Line 1 identifies the Extract group name, which we said is EDMP01.
Line 2 identifies the GoldenGate database user who can query from the source tables.
Line 3 is a comment, showing some of the various options for the FORMAT type.
Line 4 shows FORMATXML being the format specification (i.e., the output file will be XML).
Line 5 is another comment
Line 6 is where you identify the output file. In this example, the output file, my.xml, will be generated in the dirdat directory. The "dirdat" directory is one of several default directories GoldenGate creates via the CREATE SUBDIRS command (which you run during the installation process). The path is actually under <install directory>/dirdat, where <install directory> is where you installed GoldenGate and from where you run ggsci, the GoldenGate Software Command Interface. The EXTFILE parameter identifies the file as a local file, and PURGE is used to purge the contents of an existing file before writing anew.
Line 7 is the source or table. In this case, the SRC schema's TCUSTMER table is being used. The TCUSTMER table is available to you within the install directory (see the demo_ora_create.sql and demo_ora_insert.sql files).
Line 8 is a variation showing how you can use a WHERE clause to filter data.
The parameter file for this Extract is no different than any other extract or replicat parameter file: it is located in the dirprm directory, and GoldenGate creates/stores it as <lowercase group name>.prm. This example would be edmp01.prm.
The final step is to run or start the process, and a simple START EXTRACT EDMP01 is all it takes. The contents of the my.xml file are shown below.
To turn the output file into a file ready for use within SQL*Loader, switch out the commented line with the FORMATXML line (and change the name of the file).
Overall, this process is pretty easy to implement. Like any other process group, you can view a report for it via "VIEW REPORT <group name>." To further enhance the output, or more specifically, what can be inserted into a table down stream in addition to the basic data, are tokens. Tokens can be used to capture environment variables and what is taking place within GoldenGate. We’ll take a look at tokens in the next article, and use the Logdump utility to view the contents of a trail file.
Steve Callan is an Oracle DBA (OCP 8i and 9i)/developer working in Denver. His Oracle experience also includes Forms and Reports, Oracle9iAS and Oracle9iDS.