Steve Callan offers an overview of Oracle Streams and GoldenGate, comparing topology similarities and differences.
Replication in any database system, not just Oracle, provides functionality that can fill many needs. One of the main benefits of replicating data in Oracle is reducing (and practically eliminating) the load on a production server. It’s not just DML and transactions taking place on a server. The flip side of inserting, updating and deleting all that data is reporting on it. With replication, you can offload reporting to another server. Remote sites may not need to be exact replicas of the source system either, so replicating part of the source may be more efficient than duplicating it via a full-scale replication or use of a standby system.
Let’s start with a one over the world view of Streams and Data Guard. These two Oracle features are closely related.
Replication is not entirely separate from protecting data via a standby database – a logical standby, that is. Streams, for example, use some of the same architecture used to establish a logical standby. In Data Guard, you can apply SQL statements, or you can apply redo logs. With Streams, you can use pieces of Data Guard SQL Apply. SQL Apply equates to logical standby, the other being a physical standby, which is based on Redo Apply. To some degree, a limitation in each type of standby is an advantage in replication. In a physical standby, the database is closed for queries (with the exception of Active Data Guard, where the physical standby is read only). In a logical standby, the target database is open for read-write, but (and it’s a big one) not all data types are supported.
This is what Streams offers: the ability for you to make modifications on the remote or downstream database. Given that the downstream database is used for reporting purposes, being able to add indexes and materialized views, plus filter or transform data along the way (between the source and the target), your being able to do things that make queries run faster is of obvious value. What Streams captures is granulated, that is, it can capture changes made to tables, schemas, or the entire database. Coming back to that idea of filtering or transforming data along the way: think of ETL and data warehousing too.
Overall, Streams can be used for the following needs:
- Data Replication
- Date Warehouse Loading
- Database Availability (upgrade and maintenance)
- Message Queueing
- Event Management and Notification
- Data Protection
Topologies
Without getting into too much detail about Streams (and believe me, there is more than enough to make your head hurt), the key processes are Capture, Propagate, and Apply. Grab the data of interest, transport it to the remote server, and then apply it aptly describes CPA.
With CPA in mind, there are quite a few ways Streams can be architected. The simplest version is to replicate on one server. Only capture and apply matter, as there is no need to propagate over a network. However, the most common version, conceptually speaking, is shown below in Figure 1.
Figure 1 – Unidirectional system
A slightly extended version of the unidirectional system is the, well, extended unidirectional version (cascaded or directed, to use other terms). Figure 2 depicts this arrangement. The forwarding database can live on the source’s server, the target’s server, or an entirely separate server.
Figure 2 – Cascaded/directed system
Then, in no time at all, the topology gets much more complicated, and the terms N-way, Master-to-Master, Hub-and-spoke, Bi-directional and Hybrid, to name a few, come into play. And oh yeah, so far we’re only talking about Oracle to Oracle configurations. Streams supports heterogeneous configurations as well. Figure 3 is what SQL Server’s Publisher/Subscriber replication model is like (to use a simple example).
Figure 3 – Hub-and-spoke system
Figure 4 illustrates how multiple sources can feed into one target. The target could be a data warehouse, being fed by multiple data sources.
Figure 4 – Unidirectional spokes-to-hub system
Finally, an analogy for Figure 5 can be found in Active Directory, where every domain controller (DC) is a primary domain controller (in the past, there would be a primary DC and secondary DCs). In the Streams case, every database can be a database of record.
Figure 5 – N-way/Master-to-master system
How does GoldenGate compare to Streams in terms of supported topologies? Figure 6 (from the Administrator’s Guide) shows that GoldenGate is just as robust as Streams in terms of your source and target options. Some of the names or terms are different, but function-wise, they are the same.
Figure 6 – GoldenGate topologies
(from Oracle® GoldenGate Windows and UNIX Administrator’s Guide 11g Release 1 (11.1.1)
So one question about GoldenGate is this: how is data replicated? We know that data goes from source to target, but how is the data being applied? Is it more like SQL Apply (logical) or SQL Redo (physical)? If not physical, then can we assume that GoldenGate is also constrained by data types?
GoldenGate operates much like what takes place in SQL Apply, so GoldenGate target databases are akin to logical standby databases (assuming you’re doing database level replication; otherwise, tables could be of interest). And yes, there are data type limitations, but they’re not that bad or extensive.
GoldenGate Non-supported Datatypes |
|
The main difference, aside from the extra licensing costs associated with GoldenGate, is the ease of setup, installation and configuration. GoldenGate is geared more towards a heterogeneous environment, so it doesn’t depend on Oracle-specific items. Well, it does in one sense in terms of applying your own SQL statements in a call to SQLEXEC(). However, as far as relying on being able to access online redo logs, standby redo logs, or archived redo logs, no big deal. GoldenGate uses its own proprietary trail files (one set on the source and one on the target). It doesn’t care about Oracle Net Services either when it comes time to transport data from the source to the target.
GoldenGate is not without other limitations. There are situations where things go bump in the middle of the night, but what it does have is an easier interface (100% command line if you prefer) in which to explore what took place. There is no need to decipher scores of V$STREAMS_WHATEVER data dictionary views.
Another way to compare Streams and GoldenGate is to think of coupling. Obviously, Streams would be considered to be tightly coupled with Oracle (duh, where else it is used but within Oracle?). GoldenGate, on the other hand, would be considered as being loosely coupled. Given that GoldenGate is Oracle Corporation’s strategic solution for at least replication and data integration, you can be sure that whatever is loose today won’t be nearly as much in the future.
In the next article on GoldenGate, we’ll take a look at setting up a simple unidirectional replicated system.