Change Data Capture Implementation in Oracle Data Warehouses - Part 1
February 20, 2003
Data warehouse environments typically involve extraction, transformation, and loading data into the data warehouse from the source systems. Very often, the requirement is to capture the incrementally changed data (delta) from the source system with respect to the previous extract and load into the data warehouse.
Oracle database provides multiple options and features to implement this requirement, which include:
Oracle9i extends this list of features with two more options. They are:
This is the first of the 3 part series in which we will discuss an overview of "Oracle streams", its implementation and advantages, and the Change Data Capture Framework. Oracle Streams implementation will be presented in the 2nd part of this series.
Oracle Streams enables information sharing within a database, or between databases, including sharing information between Oracle and non-Oracle databases. This information includes capturing and managing events such as DML (data) and DDL (changes in datatypes, tables, procedures etc) and other message types. Users control the information flow and decide what information has to be propagated through the data stream, how the data stream flows or is routed from node to node, what happens to the events as they flow through each node, and how the stream terminates.
The Streams information flow:
The Oracle Streams architecture consists of three basic elements or processes as represented in the above diagram:
The capture process involves identifying the information that needs to be captured for propagation or configuring the Rules for the capture process. The rules determine which DML statements or DDL statements will be extracted from the redo log and published to the staging area. You can configure these rules implicitly (Oracle's default rules for specific environments such as replication, which enables automatic change capture), or explicitly for custom configuration (which requires the user to create additional procedures to initiate change capture).
How does the Change capture process work?
The capture process is essentially an oracle background process that reads the rule specified DML and DDL from the redo log and places them in the staging area. Briefly, it does the following during its phase of processing:
However, note that the capture process cannot extract changes made in the SYS and SYSTEM schemas and cannot extract some DDL and DML, e.g. ALTER DATABASE, ALTER SESSION, etc.
What are rules?
Rules are database objects that are SQL-like expressions and evaluate to a Boolean result (TRUE or FALSE or unknown). For example, the expression customer_id=1010 would evaluate to TRUE if the column customer_id is 1010. Rules follow all the guidelines for SQL statements and are similar to the condition in the WHERE clause of a SQL query.
Rules are evaluated by Oracle9i's built-in rules engine that can be used by user created applications as well as Oracle9i features such as Oracle Streams (the capture, staging and the apply process). Rules can be defined at different levels such as table level, schema level and global level (the database level).
In order to be evaluated by the rules engine, rules have to be a part of a named collection called rule sets.
In Oracle Streams, all 3 phases, the capture, staging and apply elements can be clients of the rules engine and evaluate rules for filtering the events. Each element in Oracle Streams can be associated with at the most one rule set.
Rules can be configured from using the simplest configuration method (Oracle provided defaults for Replication environments) to the more flexible and custom oriented configuration (which involves more steps in the process). The following table lists the different packages that can be used to configure the rules in Oracle Streams.
Part 2 of the series will discuss how to use the PL/SQL packages to setup and configure a simple Oracle Streams environment.
What are Logical Change Records (LCRs)?
The capture process extracts the DML/DDL statements evaluated by the rules engine from the redo log and formats them into events also called Logical Changed Records. These LCRs will then be placed in the queue or the staging area.
As we know, Oracle Streams presents us with control over what information needs to be shared, how it is propagated and to which subscriber (databases). This control can be exercised by tagging each LCR with a tag that identifies in which database the change originated (whether a local database or a remote database, especially in a N-tiered database network), for tracking the information flow or for specifying the set of destination databases that can use the information.
The capture process formats two types of LCRs, the DDL LCR and the row LCR.
The row LCR describes changes made to a single row of a table modified with a single DML statement. Thus, a single DML statement that updates 100 rows in the table will generate a 100 row LCR and a single transaction containing (say) two DML statements, each updating 100 rows will generate 200 LCRs (100 x 2 LCRs).
The DDL LCR, on the other hand, describes the changes in the database objects (such as a DDL statement issued to create, alter or drop a database object).
The following figure shows the Oracle Streams architecture.
As can be seen in the diagram, the events captured in the capture process are stored in the staging area or the queue in the database's shared memory. This queue is of the type SYS.AnyData. Events stored in queues can be propagated (source queue) or consumed (destination queue) by user applications or by another queue in a destination database or a default consumption process that has subscribed to the queue. Subscribers can apply rules to evaluate which events can be propagated or consumed from the queues.
If the subscriber is a non-oracle source database, then a user application will be required to read the source database changes, format the changes to LCRs and explicitly place the LCRs in the destination Oracle database queue, where they will be consumed.
In order to propagate events from a queue, the user should own the queue and should have the relevant privileges on the source and the destination queues. In addition, if it is a remote database, the user should also have access to the database link used by the propagation job.
Oracle Streams queue can propagate events to any number of destination queues. However, a particular source queue can propagate only one event to a particular destination queue. While a particular destination queue may receive events from multiple source queues. In addition, a particular queue can function as a source queue and a destination queue at the same time.
Again, you can schedule propagation jobs implicitly or explicitly using DBMS_STREAMS_ADM or the DBMS_PROPOGATION_ADM packages.
The consumption event or the apply process involves applying the changes to the destination directly (default) or passing it to a user defined procedure (Custom apply process) for further processing and then applying it to the destination database. The captured LCRs or the user provided LCRs or the enqueued messages represent the changes applied to the destination database.
In default apply processing, the LCRs are directly applied to the destination database successfully. If a conflict is detected, then the apply engine tries to resolve the conflict error with a conflict handler or user-specified error handlers.
In custom apply processing, the LCRs are passed as parameters to user-defined procedures, which will process the LCRs in a customized way. If the user-defined procedures process row LCRs (DML), then they are known DML handlers, while if they process DDL LCRs, then they are known as DDL handlers. Handlers that process enqueued messages are known as message handlers.
One of the key points to note is that a particular apply process, (you can have any number of apply processes for a particular database), can have multiple DML handlers but only one DDL handler, which will handle all of the DDL LCRs in the process. In addition to this, apply process cannot apply DDL LCRs' to a non-Oracle database.
The apply process can be created implicitly by using DBMS_STREAMS_ADM or DBMS_APPLY_ADM for greater control and flexibility. As in the capture and the staging process, the DBMS_STREAMS_ADM package helps you to automatically configure rules for the apply process, to evaluate which events have to be processed and applied to the destination database.
Oracle Streams capabilities
Oracle Streams provides many capabilities and the user can implement any or all of the capabilities without affecting the other capabilities. Two of the more interesting capabilities that are outlined in this article are:
Supplemental logging is a new feature that Oracle9i has included in LogMiner. LogMiner is a server utility that allows you to read information contained in online and archived redo logs, based on selection criteria. Prior to Oracle9i, LogMiner only extracted those columns which were changed (by the DML statement) and the change was identified with a rowid in the WHERE clause.
SQL> SELECT employee_id, salary FROM employee; EMPLOYEE_ID SALARY ----------------- ---------- 10 100000
Then, for the statement UPDATE employee SET sal= 150000; the LogMiner would present the information:
update "TEST"."EMPLOYEE" set "SALARY" = '150000' where "SALARY" = '100000' and ROWID ='AAABOaAABAAANZ/AAA';
Supplemental logging allows you to specify logging of additional columns in the redo log. With supplemental logging whenever a DML is performed, any columns that you specify to be logged but are not necessarily involved in the DML are also logged.
Referring to the same example again, notice that the column EMPLOYEE_ID was not logged into the redo log. If you turn on the supplemental logging for EMPLOYEE_ID, then the LogMiner information will now be:
update "TEST"."EMPLOYEE" set "SALARY" = '150000' where "EMPLOYEE_ID" = '10' and "SALARY" ='100000' and ROWID = 'AAABOaAABAAANZ/AAA';
By default, to apply changes at the destination database, Oracle Streams expects the target table to have primary/unique keys to identify the rows to be changed.
Supplemental logging can prove to be very useful in data warehousing environments where the DW table may not have a primary key, or some other column is to be used for row identification. In such cases, Oracle Streams can be configured to use "Substitute keys" at the destination database (The DW) by using the supplemental logging feature. The substitute key columns at the DW table should also be available at the source table, but may or may not be primary keys. However, if the substitute keys are not primary keys at the same source table, then they should be enabled for supplemental logging in the source database.
A transformation is a change made to the events in the capture, staging or consumption process. Transformations can include changing the datatype representation of a particular column in a table at a particular site, or renaming a column to a table at a particular site, or even changing the data values. A transformation is represented by a PL/SQL function that takes the source data type as input and returns an object of the target data type. Consider a scenario in which a column at the source table is of datatype NUMBER and the same column for the same table in the destination database is of datatype VARCHAR2. In such case we could use a rule based transformation.
This part of the article addressed Oracle Streams architecture and some of its capabilities. There is much more to Oracle Streams and detailed information can be obtained on the Oracle TechNet site. Part 2 of the series will focus on a simple implementation of Oracle Streams followed by part 3, which will describe the Change Data Capture framework of Oracle.