Change Data Capture Implementation in Oracle Data Warehouses – Part 1

Data warehouse environments typically involve extraction, transformation,
and loading data into the data warehouse from the source systems. Very often,
the requirement is to capture the incrementally changed data (delta) from the
source system with respect to the previous extract and load into the data
warehouse.

Oracle database provides multiple options and features to
implement this requirement, which include:

  1. Writing SQLs and PL/SQL procedures to implement the change capture.

  2. Materialized Views

Oracle9i extends this list of features with two more
options. They are:

  1. Oracle Streams

  2. Change Data Capture framework

This is the first of the 3
part series in which we will discuss an overview of “Oracle streams”, its
implementation and advantages, and the Change Data Capture Framework. Oracle
Streams implementation will be presented in the 2nd part of this
series.

Oracle Streams

Oracle Streams enables information sharing within a database,
or between databases, including sharing information between Oracle and
non-Oracle databases. This information includes capturing and managing events
such as DML (data) and DDL (changes in datatypes, tables, procedures etc) and
other message types. Users control the information flow and decide what
information has to be propagated through the data stream, how the data stream
flows or is routed from node to node, what happens to the events as they flow
through each node, and how the stream terminates.

The Streams information flow:


The Oracle Streams architecture consists of three basic
elements or processes as represented in the above diagram:

  1. Capture

  2. Staging

  3. Apply or Consume

Capture Process

The capture process involves identifying the information that
needs to be captured for propagation or configuring the Rules for the capture
process. The rules determine which DML statements or DDL statements will be
extracted from the redo log and published to the staging area. You can
configure these rules implicitly (Oracle’s default rules for specific
environments such as replication, which enables automatic change capture), or
explicitly for custom configuration (which requires the user to create
additional procedures to initiate change capture).

How does the Change capture
process work?

The capture process is essentially an oracle background
process that reads the rule specified DML and DDL from the redo log and places
them in the staging area. Briefly, it does the following during its phase of
processing:

  1. Extracts the DDL/DML specified by the rules you have configured
  2. Formats the statements into events called Logical Change Records (LCR).
  3. Places them in the queue or what is called the staging
    area for further processing.

However, note that the capture
process cannot extract changes made in the SYS and SYSTEM schemas and
cannot extract some DDL and DML, e.g. ALTER
DATABASE, ALTER SESSION, etc.

What are rules?

Rules are database objects that are SQL-like expressions and evaluate to
a Boolean result (TRUE or FALSE or unknown). For example, the expression
customer_id=1010 would evaluate to TRUE
if the column customer_id is 1010. Rules follow all the guidelines for SQL
statements and are similar to the condition in the WHERE clause of a SQL query.

Rules are evaluated by Oracle9i’s built-in rules engine that can
be used by user created applications as well as Oracle9i features such as
Oracle Streams (the capture, staging and the apply process). Rules can be
defined at different levels such as table level, schema level and global level
(the database level).

In order to be evaluated by the rules engine, rules have to be a part of
a named collection called rule sets.

  • A rule can be a part of many rule sets and a rule set can have many rules in it.

  • You can add /subtract rules to/from a rule set.


In Oracle Streams, all 3 phases, the capture, staging and apply elements
can be clients of the rules engine and evaluate rules for filtering the events.
Each element in Oracle Streams can be associated with at the most one
rule set.

Rules can be configured from using the simplest configuration method
(Oracle provided defaults for Replication environments) to the more flexible
and custom oriented configuration (which involves more steps in the process).
The following table lists the different packages that can be used to configure
the rules in Oracle Streams.

DBMS_STREAMS_ADM

The PL/SQL package for the simplest
configuration
of replication rules or setting up the Capture
process. Rules are added automatically by using this package to configure the
capture process.

DBMS_RULES_ADM

This PL/SQL package provides
the administrative interface for creating and managing custom rules, rule sets, and rule
evaluation contexts.

Part 2 of the series will discuss how to use the PL/SQL packages to setup and
configure a simple Oracle Streams environment.


What are Logical Change
Records (LCRs)?

The capture process extracts the DML/DDL statements evaluated by the
rules engine from the redo log and formats them into events also called Logical
Changed Records. These LCRs will then be placed in the queue or the staging
area.

As we know, Oracle Streams presents us with control over what
information needs to be shared, how it is propagated and to which subscriber
(databases). This control can be exercised by tagging each LCR with a tag that
identifies in which database the change originated (whether a local database or
a remote database, especially in a N-tiered database network), for tracking the
information flow or for specifying the set of destination databases that can
use the information.

The capture process formats two types of LCRs, the DDL LCR and the row
LCR.

The row LCR describes changes made to a single row of a table modified
with a single DML statement. Thus, a single DML statement that updates 100 rows
in the table will generate a 100 row LCR and a single transaction containing
(say) two DML statements, each updating 100 rows will generate 200 LCRs (100 x
2 LCRs).

The DDL LCR, on the other hand, describes the changes in the database
objects (such as a DDL statement issued to create, alter or drop a database
object).

The following figure shows the Oracle Streams architecture.

/* DML updates only 1 row */
UPDATE employee
SET job=’VP’
WHERE employee_id=11234;

/* DML updates (say) 10 rows */
UPDATE employee
SET comm=0.05*salary
WHERE department=’SALES’;

/* DML updates only 1 row */
UPDATE employee
SET job=’VP’
WHERE employee_id=11234;

/* DML updates (say) 10 rows */
UPDATE employee
SET comm=0.05*salary
WHERE department=’SALES’;

Staging Process

As can be seen in the diagram, the events captured in the
capture process are stored in the staging area or the queue in the database’s
shared memory. This queue is of the type SYS.AnyData. Events stored in queues
can be propagated (source queue) or consumed (destination queue) by user
applications or by another queue in a destination database or a default
consumption process that has subscribed to the queue. Subscribers can apply
rules to evaluate which events can be propagated or consumed from the queues.

If the subscriber is a
non-oracle source database,
then a user application will be
required to read the source database changes, format the changes to LCRs and
explicitly place the LCRs in the destination Oracle database queue, where they
will be consumed.

In order to propagate events from a queue, the user should
own the queue and should have the relevant privileges on the source and the
destination queues. In addition, if it is a remote database, the user should
also have access to the database link used by the propagation job.

Oracle Streams queue can propagate events to any number of
destination queues. However, a particular source queue can propagate only one
event to a particular destination queue. While a particular destination queue
may receive events from multiple source queues. In addition, a particular queue
can function as a source queue and a destination queue at the same time.

Again, you can schedule propagation jobs implicitly or
explicitly using DBMS_STREAMS_ADM or the DBMS_PROPOGATION_ADM packages.

Consumption Process

The consumption event or the apply process involves applying
the changes to the destination directly (default) or passing it to a user
defined procedure (Custom apply process) for further processing and then
applying it to the destination database. The captured LCRs or the user provided
LCRs or the enqueued messages represent the changes applied to the destination
database.

In default apply processing, the LCRs are directly
applied to the destination database successfully. If a conflict is detected,
then the apply engine tries to resolve the conflict error with a conflict
handler or user-specified error handlers.

In custom apply processing, the LCRs are passed as
parameters to user-defined procedures, which will process the LCRs in a
customized way. If the user-defined procedures process row LCRs (DML), then
they are known DML handlers, while if they process DDL LCRs, then they are
known as DDL handlers. Handlers that process enqueued messages are known as
message handlers.

One of the key points to note is that a particular apply
process, (you can have any number of apply processes for a particular database),
can have multiple DML handlers but only one DDL handler, which will handle all
of the DDL LCRs in the process. In addition to this, apply process cannot apply DDL LCRs’ to a non-Oracle database.

The apply process can be created implicitly by using
DBMS_STREAMS_ADM or DBMS_APPLY_ADM for greater control and flexibility. As in
the capture and the staging process, the DBMS_STREAMS_ADM package helps you to
automatically configure rules for the apply process, to evaluate which events
have to be processed and applied to the destination database.

Oracle Streams capabilities

Oracle Streams provides many capabilities and the user can
implement any or all of the capabilities without affecting the other
capabilities. Two of the more interesting capabilities that are outlined in this
article are:

  1. Supplemental logging

  2. Transformations

Supplemental logging

Supplemental logging is a new feature that Oracle9i has
included in LogMiner. LogMiner is a server utility that allows you to read
information contained in online and archived redo logs, based on selection
criteria. Prior to Oracle9i, LogMiner only extracted those columns which were
changed (by the DML statement) and the change was identified with a rowid in
the WHERE clause.

For example:


SQL> SELECT employee_id, salary FROM employee;

EMPLOYEE_ID SALARY
—————– ———-
10 100000

Then, for the statement UPDATE
employee SET sal= 150000;
the LogMiner would present the information:


update "TEST"."EMPLOYEE" set "SALARY" = ‘150000’ where "SALARY" = ‘100000’ and
ROWID =’AAABOaAABAAANZ/AAA’;

Supplemental logging allows you to specify logging of
additional columns in the redo log. With supplemental logging whenever a DML is
performed, any columns that you specify to be logged but are not necessarily
involved in the DML are also logged.

Referring to the same example again, notice that the column
EMPLOYEE_ID was not logged into the redo log. If you turn on the supplemental
logging for EMPLOYEE_ID, then the LogMiner information will now be:


update "TEST"."EMPLOYEE" set "SALARY" = ‘150000’ where "EMPLOYEE_ID" = ’10’ and
"SALARY" =’100000′ and ROWID = ‘AAABOaAABAAANZ/AAA’;

By default, to apply changes at the destination database,
Oracle Streams expects the target table to have primary/unique keys to identify
the rows to be changed.

Supplemental logging can prove to be very useful in data
warehousing environments where the DW table may not have a primary key,
or some other column is to be used for row identification. In such cases,
Oracle Streams can be configured to use “Substitute keys” at the destination
database (The DW) by using the supplemental logging feature. The substitute key
columns at the DW table should also be available at the source table, but may
or may not be primary keys. However, if the substitute keys are not primary
keys at the same source table, then they should be enabled for supplemental
logging in the source database.

Transformations

A transformation is a change made to the events in the
capture, staging or consumption process. Transformations can include changing
the datatype representation of a particular column in a table at a particular
site, or renaming a column to a table at a particular site, or even changing
the data values. A transformation is represented by a PL/SQL function that
takes the source data type as input and returns an object of the target data
type. Consider a scenario in which a column at the source table is of datatype
NUMBER and the same column for the same table in the destination database is of
datatype VARCHAR2. In such case we could use a rule based transformation.

Conclusion

This part of the article addressed Oracle Streams
architecture and some of its capabilities. There is much more to Oracle Streams
and detailed information can be obtained on the Oracle
TechNet
site. Part 2 of the series will focus on a simple implementation of
Oracle Streams followed by part 3, which will describe the Change Data Capture
framework of Oracle.

Latest Articles