In our previous two articles of this series, we discussed auditing and
security related aspects of SQL Server 2000 Data Transformation Services. Now,
we will combine these topics by presenting DTS Meta Data Services, which
provides the ability to audit creation, execution, and modification of DTS
packages as well as track associated data changes. In particular, you can track
package version history or data lineage. This information is stored according
to the rules of DTS Information Model, which can be accessed and manipulated by
both SQL Server Enterprise Manager and programmatic methods (for review of SQL
Server 2000 DTS, refer to its coverage
on the DatabaseJournal Web site.
As we mentioned in our previous article, the ability to save packages to
Meta Data Services is disabled by default. This is intentional, since its use
is associated with an increase in resource utilization and affects package
execution speed. In order to alter the default setting, in the Package
Properties dialog box, which appears after selecting the Properties entry from
context sensitive menu of the Data Transformation Services node in SQL Server
Enterprise Manager, you need to mark the checkbox labeled "Enable Save To
Meta Data Services". This will make available a new option "Meta Data
Services" in the Location listbox in the Save DTS Package dialog box,
displayed when saving packages from the DTS Designer interface. When you choose
this option, you will also have a chance to specify the target server where the
package meta data information will be stored, authentication information for
connecting to it, and scanning settings. These settings determine whether and
to what extent information about databases and their objects referenced by DTS
packages will be scanned and written into equivalent meta data structures. The
main check box determines if such meta data should be created to begin with.
Once you enable it, you also need to decide whether you intend to:
-
rescan all information about data stores referenced in the
package into the Meta Data Services even if such information is already present
from previous scans (corresponding to the "Scan all referenced catalogs
into repository" option) or use existing information (corresponding to
"Use scanned catalogs if already present in repository"), which saves
time and resources but should be used only if data structure has not changed
between scans. -
perform the activity defined by the previous selection for all
data structures referenced in the package (corresponding to "Scan catalog
always") or limit it only to the ones which have not yet been scanned
(corresponding to "Scan catalog if not already present in Meta Data
Services").
You also have an ability to import meta data for a particular database
(rather than for all databases referenced by a DTS package). This can be
accomplished by selecting the "Import Metadata…" item from the
context sensitive menu of Meta Data Services Packages and Meta Data nodes under
the Data Transformation Services folder in the SQL Server Enterprise Manager.
Importing databases will allow you to view their information by clicking on the
Meta Data node. The display is divided into three categories, grouped under
Browse, Lineage, and Package headings:
-
Browse section provides a listing of Database Meta Data,
including database (its name, SQL Server instance name and version), schema (in
the case of SQL Server, this refers to the database owner), tables, their
columns and characteristics for each (datatype, length, scale, precision, and nullability).
For each table and column, you can find DTS packages that use it as their data
store connection, as well as related tables and columns that function as a
source or a destination for its data. This is done by selecting Package and
Source/Destination items from the context sensitive menu of the icon
representing individual tables and columns. For this type of statistic to be
available, you need to import relevant package information into the meta data
as well (which we will discuss next). -
The second section deals with lineage, which provides the ability
to track changes to data. In general, SQL Server 2000 Meta Data Services offers
two types of lineage:-
row-level data lineage – identifies the source and destination
for a particular piece of data (a table row) as well as all transformations
that were applied to it (including DTS package execution information). This is
accomplished by assigning a globally unique identifier (GUID) to each value,
which is used for tracking purposes. -
column-level data lineage – maintains a record of columns
involved in transformations (without tracking individual values in each). This
allows correlating references between packages, which might be using the same
columns as source or destination for their transformations. It also provides
package revision and execution history, informing who, where, and when creates,
runs, and modifies packages.
In order to help you understand this concept, we
will demonstrate implementation of both types of lineage using a sample DTS
package based on a template called Template Demo.dtt, stored in Program
FilesMicrosoft SQL Server80ToolsTemplatesDTS folder. To create a package,
right click on the Data Transformation Services node in the SQL Server
Enterprise Manager and select the All Tasks -> Open Template option from its
context sensitive menu. Point to the target file in the Select File dialog box,
click on the Open command button, and choose the New Package entry in the
Select Package dialog box. This will present you with the DTS Designer
interface, containing a fairly rudimentary sample package, using a single,
undefined Transform Data Task and two OLE DB Provider SQL Server connections,
one pointing to Northwind database as the source and the other referencing tempdb
as the destination. To define transformation properties, double click on it,
and from the Properties dialog box, select the Shippers table as its source (on
the Source tab). You can click on the Preview button to verify that the source
table exists and has some data in it (three rows by default). Next, switch to
the Destination tab. This will generate a Package Error informing you that the
table does not exist at the destination, which makes sense, since we have not
created it yet. Acknowledge the error, click on the Create button and verify
that T-SQL code presented there will create the Shippers table at the
destination. Switch to the Transformations tab and note that, without any
customization, Transform Data Task simply copies data from Source to
Destination. Click on OK to close the Properties dialog box. If you execute the
package at this point, you should receive confirmation of its successful
completion. You can also easily verify that tempdb will contain the Shippers
table, with its content identical to Shippers in the Northwind database.Note, however, that information about the package
execution has not been recorded into Meta Data Services and no lineage
statistics are available. In order to be able to track row-level data lineage,
we will need to implement the following steps:1. Import the
source and target databases into the Meta Data Services.2.
Switch to the DTS Designer interface and select the Properties
option from the Package top-level menu. On the Advanced tab in the DTS Package
Properties dialog box, mark the "Show lineage variables as source
columns" and "Write lineage to repository" checkboxes, click on
OK to confirm your choice, and return to the DTS Designer.3.
Double click on the Transform Data task arrow to display the
Transform Data Task Properties dialog box. Switch to the Transformations tab.
You will notice that the list of columns in the Source changed and includes DTSLineage_Full
and DTSLineage_Short. The first entry will store the globally unique identifier
(GUID) representing the package version, the second one is intended for version
checksum (unlike with GUID, its uniqueness is not guaranteed – although still
very likely).4.
Now, you need to create identical columns at the destination. To
accomplish this, switch to the Destination tab, and click on the Create command
button. You will notice that the T-SQL statement now contains two additional
fields Lineage_Full (of type UNIQUEIDENTIFIER) and Lineage_Short (of type INT).
Rename the table identifier in the CREATE TABLE statement of the Create
Destination Table dialog box to ShippersMDS and click on the OK button. This
will create an appropriately structured table at the destination (Note that we
could have done this at the very beginning as long as the appropriate options
were selected on the Advanced tab of the package Properties dialog box – we
went through initial steps purely to demonstrate changes that take place as the
result of activating these options).5.
Now you need to modify transform properties from the
Transformations tab. To change the mapping between source and destination
columns, first click on Select All and then on Delete All. This will remove
existing mappings. To create a new one, make sure that all Source and
Destination columns are highlighted and click on the New button. From the
"Create New Transformation" dialog box, select Copy Column entry, and
click on OK twice to return to the Transformations tab.6.
Click on OK to close the Transform Data Task Properties dialog
box and return to the DTS Designer window.7.
Assign to the package a meaningful name and save it to Meta Data
Services using the Save As item from the Package menu. Ensure you select the
Meta Data Services as the target location.8.
Execute the package.At this point, all relevant row-level lineage
information should be available. For example, in the Lineage section of the
Meta Data node of SQL Server Enterprise Manager, you can type either Lineage
Long or Lineage Short values (you can extract them from ShippersMDS table in
the tempdb) in the corresponding text boxes, and click on the Find Lineage
command button. This will display information about the Package responsible for
creation of the corresponding table entry. Similarly, you can obtain
information about its source using the Source/Destination context sensitive
menu option..To implement column-level data lineage, all that is
necessary is to ensure that you mark the "Resolve package references to
scanned catalog meta data" checkbox in the Scanning Options dialog box
when saving the package into Meta Data Services. In our case, you can choose
"Scan all referenced catalogs into repository" and "Scan catalog
always" options. -
-
Package section displays information about packages saved to Meta
Data, including such information as, for example, their name, version, author,
creation time, lineage, and execution statistics (account used to launch it and
launch time). You should be able at this point to see reference to the package
we just saved and executed.
Note that when saving to Meta Data Services, you lose the ability to assign
owner and user passwords. While, this makes your packages less secure, you have
the ability to audit their use. In addition, keep in mind that this
functionality is used primarily for troubleshooting more complex data
warehousing scenarios, where keeping track of transformations via DTS Packages
would be otherwise fairly difficult.