The second section deals with lineage, which provides the ability
to track changes to data. In general, SQL Server 2000 Meta Data Services offers
two types of lineage:
-
row-level data lineage – identifies the source and destination
for a particular piece of data (a table row) as well as all transformations
that were applied to it (including DTS package execution information). This is
accomplished by assigning a globally unique identifier (GUID) to each value,
which is used for tracking purposes.
-
column-level data lineage – maintains a record of columns
involved in transformations (without tracking individual values in each). This
allows correlating references between packages, which might be using the same
columns as source or destination for their transformations. It also provides
package revision and execution history, informing who, where, and when creates,
runs, and modifies packages.
In order to help you understand this concept, we
will demonstrate implementation of both types of lineage using a sample DTS
package based on a template called Template Demo.dtt, stored in Program
FilesMicrosoft SQL Server80ToolsTemplatesDTS folder. To create a package,
right click on the Data Transformation Services node in the SQL Server
Enterprise Manager and select the All Tasks -> Open Template option from its
context sensitive menu. Point to the target file in the Select File dialog box,
click on the Open command button, and choose the New Package entry in the
Select Package dialog box. This will present you with the DTS Designer
interface, containing a fairly rudimentary sample package, using a single,
undefined Transform Data Task and two OLE DB Provider SQL Server connections,
one pointing to Northwind database as the source and the other referencing tempdb
as the destination. To define transformation properties, double click on it,
and from the Properties dialog box, select the Shippers table as its source (on
the Source tab). You can click on the Preview button to verify that the source
table exists and has some data in it (three rows by default). Next, switch to
the Destination tab. This will generate a Package Error informing you that the
table does not exist at the destination, which makes sense, since we have not
created it yet. Acknowledge the error, click on the Create button and verify
that T-SQL code presented there will create the Shippers table at the
destination. Switch to the Transformations tab and note that, without any
customization, Transform Data Task simply copies data from Source to
Destination. Click on OK to close the Properties dialog box. If you execute the
package at this point, you should receive confirmation of its successful
completion. You can also easily verify that tempdb will contain the Shippers
table, with its content identical to Shippers in the Northwind database.
Note, however, that information about the package
execution has not been recorded into Meta Data Services and no lineage
statistics are available. In order to be able to track row-level data lineage,
we will need to implement the following steps:
1. Import the
source and target databases into the Meta Data Services.
2.
Switch to the DTS Designer interface and select the Properties
option from the Package top-level menu. On the Advanced tab in the DTS Package
Properties dialog box, mark the "Show lineage variables as source
columns" and "Write lineage to repository" checkboxes, click on
OK to confirm your choice, and return to the DTS Designer.
3.
Double click on the Transform Data task arrow to display the
Transform Data Task Properties dialog box. Switch to the Transformations tab.
You will notice that the list of columns in the Source changed and includes DTSLineage_Full
and DTSLineage_Short. The first entry will store the globally unique identifier
(GUID) representing the package version, the second one is intended for version
checksum (unlike with GUID, its uniqueness is not guaranteed – although still
very likely).
4.
Now, you need to create identical columns at the destination. To
accomplish this, switch to the Destination tab, and click on the Create command
button. You will notice that the T-SQL statement now contains two additional
fields Lineage_Full (of type UNIQUEIDENTIFIER) and Lineage_Short (of type INT).
Rename the table identifier in the CREATE TABLE statement of the Create
Destination Table dialog box to ShippersMDS and click on the OK button. This
will create an appropriately structured table at the destination (Note that we
could have done this at the very beginning as long as the appropriate options
were selected on the Advanced tab of the package Properties dialog box – we
went through initial steps purely to demonstrate changes that take place as the
result of activating these options).
5.
Now you need to modify transform properties from the
Transformations tab. To change the mapping between source and destination
columns, first click on Select All and then on Delete All. This will remove
existing mappings. To create a new one, make sure that all Source and
Destination columns are highlighted and click on the New button. From the
"Create New Transformation" dialog box, select Copy Column entry, and
click on OK twice to return to the Transformations tab.
6.
Click on OK to close the Transform Data Task Properties dialog
box and return to the DTS Designer window.
7.
Assign to the package a meaningful name and save it to Meta Data
Services using the Save As item from the Package menu. Ensure you select the
Meta Data Services as the target location.
8.
Execute the package.
At this point, all relevant row-level lineage
information should be available. For example, in the Lineage section of the
Meta Data node of SQL Server Enterprise Manager, you can type either Lineage
Long or Lineage Short values (you can extract them from ShippersMDS table in
the tempdb) in the corresponding text boxes, and click on the Find Lineage
command button. This will display information about the Package responsible for
creation of the corresponding table entry. Similarly, you can obtain
information about its source using the Source/Destination context sensitive
menu option..
To implement column-level data lineage, all that is
necessary is to ensure that you mark the "Resolve package references to
scanned catalog meta data" checkbox in the Scanning Options dialog box
when saving the package into Meta Data Services. In our case, you can choose
"Scan all referenced catalogs into repository" and "Scan catalog
always" options.