Discover what happens
within SQL Server during and after configuring Oracle as a data source. Quite a
few objects are created, including a system-level database, numerous jobs
running under the SQL Server agent, and flat files created on the file system.
Read on to learn more.
In this final article on
using Oracle as a data source, we’ll take a look at what happens within SQL
Server during and after the configuration process. Quite a few objects are
created, including a system-level database, numerous jobs running under the SQL
Server agent, and flat files created on the file system. Setting
up Oracle as a Data Source for SQL Server discusses the obstacles you’ll
run into when setting up Oracle as a data source for SQL Server and offers a
work-around that will allow you to continue the setup.
The “normal” set of
system-level databases within MSSQL includes master, model, msdb and tempdb.
Upon completion of the steps provided by Microsoft, a new database – named
distribution – is created.
The distribution database contains
numerous tables related to replication activity. Without a detailed map of what
the various tables contain, you’re pretty much left on your own to dig through
these tables for information that may be relevant for or to you.
As a simple example, issue
an update against the EMP table and commit (set sal = 900 where sal = 800
affects only one row and is easy to undo later). Within the MSrepl_transactions
table, you can see information about the transaction. In the example below, the
only relevant piece of information is the entry_time.
The IHpublishers table shows
information about publishers. In this scenario, we know we used Oracle and a
database name ORCL.
We don’t see the db_name
here, but at least we can confirm ORACLE and the version (which fits since I
used 10gR2 via the 10.2.0.4.0 download for 64-bit Server 2008 at Oracle
Technology Network).
You can go through more
distribution database tables on your own, and what should become fairly obvious
is that the contents of the database are more for MSSQL than for you. Overall,
this is in line with the purpose of other MSSQL-oriented databases (as opposed
to user databases). The publisher_guid shown above is useless information for
you, and it looks a whole lot like the ID value shown in a ReportServer
database for the ID of a published report. But nonetheless, you can glean
information from here if need be.
Moving down the Object
Explorer tree to jobs under SQL Server Agent in SQL Server Management Studio,
we see that quite a few jobs are created (ten in all, with one not enabled).
The main job of interest
with respect to scheduling is the one with “MyOraclePub” in the name
(MyOraclePub was the name chosen for new publication as one of the last steps
in the New Publication Wizard dialog windows). Go into the properties of this
job and drill down to Schedule. The default scheduling is to have the agent run
replication once per hour on the hour.
Other scheduled jobs are
related to maintenance activities. The first one in the list runs every ten
minutes, with its mission being to remove replication agent history from the
distribution database. The last one in the list runs once a day (at 2am), and
is different from all the rest in that the steps to create this job do not
include a description or category.
On the file system, based on
the article (i.e., table) chosen, MSSQL scripts out what is essentially a
create table script and a data file (the table data is in the file). For one
table (EMP), three files are created: a data file, an index file, and a create
table (with other steps) file. There may be more than one location, based on
when the setup is run, so look for a date time group folder. One set of files
in my setup is in:
C:\Program Files\Microsoft
SQL Server \MSSQL10.MSSQLSERVER\MSSQL\repldata\unc
\ORCL_DISTRIBUTION_MYORACLEPUB\20100201190007
Ignore the last digit and
read off the date/time as YYYYMMDDHHMI (with HH in 24-hour format). The “SCH”
file is the schema file, and contains the T-SQL version of Oracle’s CREATE
TABLE statement.
drop Table [dbo].[EMP] go CREATE TABLE [dbo].[EMP]( [EMPNO] [numeric] (4, 0) NOT NULL, [ENAME] [varchar] (10) NULL, [JOB] [varchar] (9) NULL, [MGR] [numeric] (4, 0) NULL, [SAL] [numeric] (7, 2) NULL, [COMM] [numeric] (7, 2) NULL, [DEPTNO] [numeric] (2, 0) NULL , CONSTRAINT [MSHREPL_1_PK] PRIMARY KEY ( [EMPNO] ) ) go
Note that all of the numeric
columns have precision and scale, whereas the description for EMP in SQL*Plus
shows EMPNO, MGR and DEPTNO as NUMBER 4, 4 and 2. The primary key constraint is
picked up as well, and the naming suggests Microsoft Heterogeneous Replication
via MSHREPL.
The data file, usable via
the bcp
utility in MSSQL, looks XML-ish, but is not. Records are separated by
<,@g>, columns or changes in field by <x$3>, and null values by a
blank line (somewhat of a departure where whitespace is typically ignored, but
in these files, it matters).
There are several other
objects worth examining, and the curious reader is left to do that upon his or
her own accord. With everything all said and done, how do you see what takes
place? One interface into that is the Replication Monitor. The entry and setup
of this monitor is quite similar to how the setup for Database Mirroring
Monitor is performed.
Right-click Replication and
select Launch Replication Monitor.
Select Add Publisher after
right-clicking My Publishers to add the Oracle publisher (or use the Action
menu if so desired). The ensuing steps are fairly self-explanatory, and the end
result is that the MyOraclePub distribution is added to the list of publishers.
Once enabled, select it and view the activity (there may be none) under the
various tabs.
Of interest to us is the
Agents tab.
Right-click>View Details
the first line (completion of the Snapshot Agent) and view the history of how
your selected article was replicated via a snapshot.
For a 14 row table, with
both RDBMSs running on the same server, the time to snapshot was about ten
seconds. Obviously, a very small example in terms of data quantity, but you may
want to consider testing the amount of time it takes to snapshot a
significantly sized table.
All of this may seem like a
lot of work to replicate data between Oracle and SQL Server, and quite frankly,
it is. There is lots of room for error, and even with all steps being
faithfully followed via wizards and the like, you can still wind up with errors
related to data access (MSSQL forums have an abundance of questions related to where
a publisher or distributor test errors out because of a data access issue, not
to mention what it takes to clean up an aborted or failed replication
installation).
Within Oracle, we have other
options, and perhaps the simplest is to go back to what was mentioned as the
framework concept in Oracle
as a Data Source, that is, use a materialized view. An even simpler
approach would be the use of an auditing table and a job.
We don’t care about the
change history in the replicated set of data, but rather, we care about the
current state of data being present. If EMPNO 7369 exists in the audit table,
then we are either inserting into the replication table or updating the
existing record. Once all records are processed in the audit table, it is
flushed and initialized for the next set of transactions to be replicated.
What’s even better about this simpler approach is that the target data store –
SQL Server in this example – does not need to be Enterprise Edition level.
Standard Edition will suffice quite nicely, along with its much lower cost. We
can dodge the big bucks for bells and whistles paid for otherwise that can be
emulated by much simpler means.
In Closing
A more formal comparison
between replication methods within Oracle and within SQL Server can be found in
an October 2008 Oracle white paper (“Technical Comparison of Oracle Database 11g
versus SQL Server 2008”). Oracle Corporation, never bashful in its claims,
states right up front than in all areas, “Oracle continues to provide a more
robust and flexible solution.”
The paper compares Oracle’s
features against those of SQL Server’s, and does not go into the pros and cons
of using one RDBMS as the data source for the other. Overall, there is more
than one way to publish Oracle data into SQL Server. It may be best for Oracle
to take care of the transactional updates and snapshots, and then use already
updated materialized views as the Oracle data source for SQL Server.