Datawarehouse environments face the challenge of
exchanging, integrating and consolidating huge amounts
of data over many systems to provide a unified information
base for Enterprise Intelligence. This process in
general is refererred to as ETL ( Extract, Transform
and Load ). Extract process is very often involved
with the extraction of the data from many source
systems. For example a content network like internet.com
would be required to consolidate the logs that it
generates across hundreds of sites that area part
of its network to create intelligence that can be
used for managing the visitor information. The most
time consuming part of this process is the Transformation
and Loading process where multiple filters are applied
and the data is validated against an set of known
information and loaded into the current enterprise
warehouse system. This has to be done in a very
quick and scaleable manner without affecting the
source and Warehouse systems.
Most ETL processes today are done
outside the database using 3rd party software that
can provide specific capabilities to access disparate
source systems before the data is transformed. Once
the data is transformed, current Oracle Warehouses
use parallel insertion or loading techniques. The
primary role of the database at this point are to
manage the rows, indexes and constraints. Some ETL
processes use a serial approach of Extract Load
and Transform, thereby using more of the database
capabilities to perform the transformation of the
data. In this approach the data is extracted and
loaded into a staging area using 3rd party tools
and then PL/SQL or Java is used within the oracle
database system to transform the data before its
posted to the final tables. The inherent disadvantages
with this approach are many including the inability
of the ETL software to scale, No control over recovery
and restartability in case of a failed ETL process
and the ever present nightmare of managing home
grown processes.
Oracle 9i introduces the new paradigm
of transform-while-loading replacing the age old
serial transform-then-load and load-then-transform
processes. In the new approach the database becomes
an integral part of the ETL process and some of
the necessary steps become obsolete while others
can be remodeled to enhance the process. The database
now offers a toolkit that enables the most appropriate
ETL process flow for a specific customer need and
not dictate or constrain it from a technical perspective.
9i implements the following new functionality to
help the process be faster and more efficient.
Oracle Change Data Capture Framework
can be used
to optimize the extraction portion of ETL process
and build the basic maintenance framework for the
repetitive scheduled execution of the complete ETL
process. OCDC has the capability to capture changed
data from oracle data sources. Using the proven
Replication Framework and the Logminer technologies
oracle can perform either synchronous or asynchronous
change data capture when the source systems are
using Oracle. An extensible API is provided for
3rd party vendors to integrate external data drivers
for non-Oracle sources into the Framework.
External Tables
feature allows external data sources
like flat files to be exposed to users like any
other data residing in a regular table. The only
limitation is that this table is read only and acts
as a virtual table that can be used in SQL, PL/SQL
and Java operations without having to first load
the data into the database. This enables the merging
of the loading process with the transformation process
by reducing any interruption of data streaming and
staging the data inside the database.
Multi Table Insert
is a cool new feature that every
dba and developer can appreciate. Until Oracle 8i,
you had to run throught the data multiple times
or write procedural code to perform inserts into
multiple tables in one pass. 9i’s Multi Table Insert
feature provides a feature to insert data into multiple
tables in one pass.
Other notable features are Upsert,
which is a sql extension that provides the functionality
to conditionally update or insert a row into the
database. More enhancements, allows a database to
have different block sizes when transporting tablespaces
between databases. Starting with 9i, the source
and target systems do not have to be the same block
size.
With the addition of these
new features Oracle enhances its capability as a
data-processing platform in a data warehousing environment.