Configuring Oracle as a Data Source for SQL Server
February 8, 2010
Discover what happens within SQL Server during and after configuring Oracle as a data source. Quite a few objects are created, including a system-level database, numerous jobs running under the SQL Server agent, and flat files created on the file system. Read on to learn more.
In this final article on using Oracle as a data source, well take a look at what happens within SQL Server during and after the configuration process. Quite a few objects are created, including a system-level database, numerous jobs running under the SQL Server agent, and flat files created on the file system. Setting up Oracle as a Data Source for SQL Server discusses the obstacles you'll run into when setting up Oracle as a data source for SQL Server and offers a work-around that will allow you to continue the setup.
The normal set of system-level databases within MSSQL includes master, model, msdb and tempdb. Upon completion of the steps provided by Microsoft, a new database named distribution is created.
The distribution database contains numerous tables related to replication activity. Without a detailed map of what the various tables contain, youre pretty much left on your own to dig through these tables for information that may be relevant for or to you.
As a simple example, issue an update against the EMP table and commit (set sal = 900 where sal = 800 affects only one row and is easy to undo later). Within the MSrepl_transactions table, you can see information about the transaction. In the example below, the only relevant piece of information is the entry_time.
The IHpublishers table shows information about publishers. In this scenario, we know we used Oracle and a database name ORCL.
We dont see the db_name here, but at least we can confirm ORACLE and the version (which fits since I used 10gR2 via the 10.2.0.4.0 download for 64-bit Server 2008 at Oracle Technology Network).
You can go through more distribution database tables on your own, and what should become fairly obvious is that the contents of the database are more for MSSQL than for you. Overall, this is in line with the purpose of other MSSQL-oriented databases (as opposed to user databases). The publisher_guid shown above is useless information for you, and it looks a whole lot like the ID value shown in a ReportServer database for the ID of a published report. But nonetheless, you can glean information from here if need be.
Moving down the Object Explorer tree to jobs under SQL Server Agent in SQL Server Management Studio, we see that quite a few jobs are created (ten in all, with one not enabled).
The main job of interest with respect to scheduling is the one with MyOraclePub in the name (MyOraclePub was the name chosen for new publication as one of the last steps in the New Publication Wizard dialog windows). Go into the properties of this job and drill down to Schedule. The default scheduling is to have the agent run replication once per hour on the hour.
Other scheduled jobs are related to maintenance activities. The first one in the list runs every ten minutes, with its mission being to remove replication agent history from the distribution database. The last one in the list runs once a day (at 2am), and is different from all the rest in that the steps to create this job do not include a description or category.
On the file system, based on the article (i.e., table) chosen, MSSQL scripts out what is essentially a create table script and a data file (the table data is in the file). For one table (EMP), three files are created: a data file, an index file, and a create table (with other steps) file. There may be more than one location, based on when the setup is run, so look for a date time group folder. One set of files in my setup is in:
SQL Server \MSSQL10.MSSQLSERVER\MSSQL\repldata\unc
Ignore the last digit and read off the date/time as YYYYMMDDHHMI (with HH in 24-hour format). The SCH file is the schema file, and contains the T-SQL version of Oracles CREATE TABLE statement.
drop Table [dbo].[EMP] go CREATE TABLE [dbo].[EMP]( [EMPNO] [numeric] (4, 0) NOT NULL, [ENAME] [varchar] (10) NULL, [JOB] [varchar] (9) NULL, [MGR] [numeric] (4, 0) NULL, [SAL] [numeric] (7, 2) NULL, [COMM] [numeric] (7, 2) NULL, [DEPTNO] [numeric] (2, 0) NULL , CONSTRAINT [MSHREPL_1_PK] PRIMARY KEY ( [EMPNO] ) ) go
Note that all of the numeric columns have precision and scale, whereas the description for EMP in SQL*Plus shows EMPNO, MGR and DEPTNO as NUMBER 4, 4 and 2. The primary key constraint is picked up as well, and the naming suggests Microsoft Heterogeneous Replication via MSHREPL.
The data file, usable via the bcp utility in MSSQL, looks XML-ish, but is not. Records are separated by <,@g>, columns or changes in field by <x$3>, and null values by a blank line (somewhat of a departure where whitespace is typically ignored, but in these files, it matters).
There are several other objects worth examining, and the curious reader is left to do that upon his or her own accord. With everything all said and done, how do you see what takes place? One interface into that is the Replication Monitor. The entry and setup of this monitor is quite similar to how the setup for Database Mirroring Monitor is performed.
Right-click Replication and select Launch Replication Monitor.
Select Add Publisher after right-clicking My Publishers to add the Oracle publisher (or use the Action menu if so desired). The ensuing steps are fairly self-explanatory, and the end result is that the MyOraclePub distribution is added to the list of publishers. Once enabled, select it and view the activity (there may be none) under the various tabs.
Of interest to us is the Agents tab.
Right-click>View Details the first line (completion of the Snapshot Agent) and view the history of how your selected article was replicated via a snapshot.
For a 14 row table, with both RDBMSs running on the same server, the time to snapshot was about ten seconds. Obviously, a very small example in terms of data quantity, but you may want to consider testing the amount of time it takes to snapshot a significantly sized table.
All of this may seem like a lot of work to replicate data between Oracle and SQL Server, and quite frankly, it is. There is lots of room for error, and even with all steps being faithfully followed via wizards and the like, you can still wind up with errors related to data access (MSSQL forums have an abundance of questions related to where a publisher or distributor test errors out because of a data access issue, not to mention what it takes to clean up an aborted or failed replication installation).
Within Oracle, we have other options, and perhaps the simplest is to go back to what was mentioned as the framework concept in Oracle as a Data Source, that is, use a materialized view. An even simpler approach would be the use of an auditing table and a job.
We dont care about the change history in the replicated set of data, but rather, we care about the current state of data being present. If EMPNO 7369 exists in the audit table, then we are either inserting into the replication table or updating the existing record. Once all records are processed in the audit table, it is flushed and initialized for the next set of transactions to be replicated. Whats even better about this simpler approach is that the target data store SQL Server in this example does not need to be Enterprise Edition level. Standard Edition will suffice quite nicely, along with its much lower cost. We can dodge the big bucks for bells and whistles paid for otherwise that can be emulated by much simpler means.
A more formal comparison between replication methods within Oracle and within SQL Server can be found in an October 2008 Oracle white paper (Technical Comparison of Oracle Database 11g versus SQL Server 2008). Oracle Corporation, never bashful in its claims, states right up front than in all areas, Oracle continues to provide a more robust and flexible solution.
The paper compares Oracles features against those of SQL Servers, and does not go into the pros and cons of using one RDBMS as the data source for the other. Overall, there is more than one way to publish Oracle data into SQL Server. It may be best for Oracle to take care of the transactional updates and snapshots, and then use already updated materialized views as the Oracle data source for SQL Server.