Fundamentals of Oracle GoldenGate
January 12, 2011
Oracle Corp. has acquired so many companies of late that it's getting to be a job to keep up with things. GoldenGate is one of the products you'll want to stay on top of. It's still kind of under the radar, but it's going to be big. Really big.
Oracle (the corporation) has acquired so many other companies as of late that it’s getting to be job in and of itself to keep up with things. But with GoldenGate, this is one of the products you’ll want to be familiar with. This is one of those products or features where its usage and user base (at least within Oracle) is still kind of under the radar, but it’s going to be big. Really big.
We all know about data within a database, and how to put it in via a database language such as SQL. And we’re familiar with SQL reaching out to another database. But that’s something typically done ad hoc or via a job. How do you “flow” data from one database to another, hands free? That is, an insert on Table A in Database A gets propagated to Table A on Database B? Right now, Oracle Streams (named that I’m sure to reflect the idea of flowing streams of data) fulfills this requirement.
Oracle Streams comes with about, oh, 8,000 data dictionary views and DBMS built-ins, and is certainly not lacking in terms of complexity and moving parts. Additionally, a Streams implementation can include elements of Data Guard (specifically in the Downstream Capture configuration). With respect to Streams and Data Guard, GoldenGate encompasses both and can either augment or replace either tool.
So, what does it take to use GoldenGate?
First off, it takes money, as in more of it, because of how Oracle has chosen to license it. Its cost is tied to the database server licensing metric, so for one processor (perpetual), you’re looking at $17,500 for licensing and $3,850 for the first year’s support. Streams, in comparison, is included in whichever edition you use.
Second, using Linux as the host operating system, it’s a matter of downloading and uncompressing a tar file. Most everything for running GoldenGate is based out of a “ggs” directory. Running on Windows is a bit more involved because of services.
Finally, you have to get used to a new language replete with hundreds of new keywords using a not too complex syntax. The running of GoldenGate can be 100% command line interface. There is an optional add on product (Director) which offers a GUI-based centralized management feature. GoldenGate Director at the Oracle Store web site is Oracle Management Pack for Oracle GoldenGate. The documentation still shows Director.
GoldenGate can do two things for you in terms of replication. Part of replication includes having a downstream or replicated copy (think of a restored backup, however you want to get that in place) of the target. If you choose, GoldenGate can perform the initial load, although it is recommended that you use native RDBMS tools for this step.
The second (and main) task is running and managing the replication process. Replicate how? From just one database to another, or something along the lines of what Streams can do (one-way, bi-directional, N-way)? The topology diagram below should cover just about every possible way you could envision replicating data.
Another neat feature of GoldenGate is that it practically couldn’t care less which database system you’re coming from or going to. It’s almost like a Mac: everything just works. GoldenGate mines transactions from what it refers to as transaction logs (so obviously in Oracle’s case, the redo logs). The extraction process creates “trail” files GoldenGate then sends to the RMTHOST (the remote host, and congratulations if you’re new to GoldenGate: you’ve just seen your first keyword). The configuration workflow diagram shown below highlights this agnostic approach to the database systems involved.
The reference to “Data Pump” in the diagram is not the same Data Pump used in Oracle. The concept is the same though (data being moved at a relatively high speed). Similar to Streams setup, access to the remote host is much easier to configure/perform if the admin account username is the same on all systems.
Under Step 4, you see the word Replicat. That is not a typo: Replicat is the word/term/process used in and by GoldenGate. If using GoldenGate for your replication needs, Replicat will become very familiar.
This leads us to the GoldenGate Software Command Interface, otherwise known as ggsci. GGSCI is to GoldenGate as SQL*Plus is to Oracle. The table below (partial listing) gives you an idea of what GGSCI uses word-wise. Note the INFO word in the left column: plenty of online (well, at least in your session) help is available simply by typing “help <whatever>” at the command line.
One of the downsides of Streams processing, where data is typically dealt with on a row-by-row basis for a logical change record (LCR), is that bulk loads can cause performance problems. GoldenGate, and Streams to a degree, has built-in access to SQL*Loader type APIs.
Timing is everything, so how does GoldenGate keep track of transactions? Just like everyone does by using a change number (for example, Oracle’s SCN, MSSQL’s LSN). GoldenGate’s internal bookkeeping “checkpoint” is based on a CSN, or Commit Sequence Number. Knowing the CSN allows you to pick up after a stopped or interrupted process.
For learning how to use GoldenGate, you definitely need to put hands on the keyboard and bang away. Understanding the relationship of parameter and definition files – at an abstract level – is simple; the rub is getting used to GoldenGate’s language and keywords. The sample data in the demo files included with the tool is seriously lacking. For teaching this class, one of the things I incorporate is running a stream of data. I mean, you want to see this work and observe its work in progress. Simply flowing two or three records in the demo files is fine for getting things to work in the first place, but after that, let’s fire this thing up and see what it does.
For the money it costs, especially if you are considering replacing Streams, you’re going to want to see a couple of advantages or benefits over what you’re doing now: a better/easier setup and management, and data being replicated at near real time. No product will ever be exactly real time, but we can get pretty close.
In subsequent articles, I’ll cover some basic examples of using GoldenGate.