Oracle Warehouse Builder Automated ETL Processing
May 14, 2008
One of Oracles most powerful but somewhat under the radar tool is Warehouse Builder. Like several of Oracles non-RDBMS products, it didnt get off to exactly the best start possible with respect to being user friendly, bug free, easy to use/install, documented, and well, you get the idea. However, in later versions, Warehouse Builder has evolved into an extremely feature rich, highly functional application which enables users to do some pretty amazing things without having to get wrapped up in all of the gory details.
This article assumes you already have OWB installed and are trying to figure out how to build an automated ETL process. What are the step-by-step details to make this happen?
Warehouse Builder, frequently referred to as OWB, can automate the loading of flat files into a database. Many DBAs are intimately familiar with using a combination of SQL*Loader and shell scripting, plus having some cron action tossed in here and there. OWB does that (and much more) as well, by abstracting this process via a wizard driven, GUI interface with lots of point and click functionality. Through its Design Center and Control Center interfaces, a user can design and deploy the ETL process (and well just focus on the loading part, that is, what it takes to get the contents of a delimited flat file loaded into a table, no changes to data along the way). And the deployment is not limited to the server youre working on at the time. OWB enables you to design a process on one server and then deploy the steps to another server. Or, to more than one server if you so desire.
What is the procedural concept behind this operation? Enumerating the steps to make this happen helps to provide a framework.
1. Identify the source file, including the location and the nature of the data within.
2. Create an external table which serves as the data dictionary container.
3. Identify, and if necessary, create the real table in the database.
4. Make all of this happen on a scheduled basis, or as a one time deal.
OWBs approach to this process is to use the metadata about these objects and tie them together via a mapping and a workflow. The workflow, in fact, can be created as a visual artifact, that is, OWB will produce a workflow diagram, which, interestingly enough, is exactly what OWB uses behind the scenes: Oracle Workflow.
Think of each piece in this process as being an object: the file, the location of the file, the external table, the table, the mapping of the flat file to the external table, the job to run this at the end, and so on. Everything is an object, and objects can be tied together via hierarchies and dependencies. Every object type lives in a module. Since this tool is Java-based, the object oriented design makes sense. Each object is an instance or instantiation of some method, generally speaking.
A typical ETL scenario could involve uploading a flat file on a recurring basis (of which, youll also be able to execute just the mapping piece by itself). If you break down the process into several steps involving different parts of the project tree, the overall task is easy to understand. As a point of reference, well start at the bottom and work our way up the hierarchy, with one exception. Within a project, to get started, you will need to have a module created under Oracle.
As a tip, remember that almost every category involves the same two steps: create a module and import metadata. Also the examples are not always based on the same table (something related to customers and countries were used).
An expanded project tree is shown below.
The areas of interest in the project tree, in the order in which we want to build the ETL process are:
Once the mapping step is complete, you need to drop down to Process Flows and Schedules.
Create a new module under Files and identify the location of the file.
Finish the Create Module step and launch the Import Metadata wizard. This is where you tell OWB about the contents of the flat file, which in turn launches the Flat File Sample wizard. Dont forget to add a date mask for date fields. After the file has been identified and sampled, youre ready to create the metadata about an external table. Step 1 of the Flat File Sample wizard is shown below.
When identifying the field delimiter, you can manually enter something other than what the drop down shows (e.g., a pipe). Once the external table definition is complete, you could deploy the external table right now and have that table created in the target schema.
In the next step, we need to create a table. If the table has to be created from scratch, open the Data Object Editor and design it. Ensure the column definitions match those of the corresponding external table. As a tip, deploy the external table first, and then in SQL*Plus, create the table via CTAS from the external table (just the table definition, no data).
The next step is to map the contents of the external table into the real table. Create a new mapping and map the columns, as the picture below suggests.
Deploy the mapping and if successful, it is at this point where you could manually run (Start) the load of data from the external table into the real table. This is verified by entering the Control Center and looking at jobs run.
To put this into an automated workflow, a new Process Module is whats needed. Create the process flow module, package and process flow, and launch into the Process Editor. Add in the mapping and end states, with a result like this:
Create a new schedule once the process flow diagram is finished. The schedule will be generic, that is, it is not tied to anything. You have to go back to the mapping and associate the mapping to the schedule. Once scheduled, the loading of the stage table from a flat file is automated.
Although this was a somewhat higher level overview of what it takes to automate a load, it did cover each and every project item you need to address for this to work. For a one time file, and even for the same file or set of files day in and day out, it may be easier to shell script this work flow and schedule it via a cron job. On the other hand, once this process is configured, it can be deployed from a development to QA or production environment. And dont let the warehouse part of this tool lead you to believe this is only for data warehouses. If you need to map, design and schedule an ETL load in a different type of database, Warehouse Builder will work just fine there as well.