As a database administrator, besides manipulation of the data residing in
SQL Server 2000 databases, you are frequently faced with tasks that deal with
transfer of data between diverse data sources. Frequently, such data needs to
be also transformed or altered during the transfer. In the past (in SQL 6.5 and
earlier), such issues presented a significant challenge and required
substantial investment in development time. Data Transformation Services (DTS),
introduced in SQL Server 7.0 and further improved in SQL Server 2000
drastically changed this situation. Our goal in this series of articles will be
presenting the most relevant concepts of SQL 2000 DTS.
Data Transformation Services is a technology providing the ability to
exchange and modify data among any OLE DB compliant data sources. SQL Server
DTS is implemented as a set of programmable objects (forming DTS object model)
accessible through programming and scripting, and through a much simpler to use
graphical interface. Regardless of the way of interfacing with the DTS object
model, activities involved in transition and transformation of data are
organized into units called packages. Every DTS package functions as a
container for four types of components:
-
connections – represent data source or target and are implemented
using OLE DB providers. SQL 2000 DTS has built-in connections for SQL Server,
Access, Excel, Visual FoxPro, text and html files, as well as third party
databases such as Oracle, Paradox, and dBase. Others can be installed using
additional OLE DB providers. -
tasks – represent individual actions such as data transition or
transformation. Tasks are also used to provide status notifications. -
precedence constraints – determine conditions necessary for a
task to execute, creating additional logic in the package workflow. -
global variables – values or sets of values (such as entire rowsets)
that can be shared among components of the same package.
There are different ways of creating DTS packages. The simplest one is based
on available Wizards (DTS Export Wizard and DTS Import Wizard listed in the
Tools -> Wizards -> Data Transformation Services in the SQL Enterprise
Manager), which take you through a sequence of steps leading to creation of a new
package. The trade-off for this simplicity is the limited number of
configuration options. This also is the case when using Copy Database Wizard
(listed in the Tools -> Wizards -> Management node). The recommended tool
that provides balance between ease of use and range of available configuration
options is DTS Designer (you can launch it by selecting the New Package option
from the context sensitive menu of the Data Transformation Services node in the
Enterprise Manager window). This is the most common way of creating complex
packages, which you can also use to refine wizard-generated packages further.
Finally, the most flexibility is available by accessing the DTS COM
programmable interface; however, this requires familiarity with scripting or
programming languages and DTS object model and takes the most time and effort.
Even though you do not have to save a package before you execute it,
typically it is a good idea. This not only gives you a chance to reconfigure or
rerun it later (if the data transformation process takes place multiple times),
but also allows you to schedule it for execution. Packages can be stored in
several types of locations, each one with a number of benefits and drawbacks:
-
sysdtspackages table of the MSDB database, which corresponds to
the Local Packages node under Data Transformation Services folder of SQL
Enterprise Manager. The main benefit of this option is support for versioning –
for any local package you can access and edit each of its previously saved
versions, not just the most current one. In addition, during save, you can
assign user and owner passwords, required, respectively, in order to execute
and edit this package. -
Microsoft Repository, which corresponds to the Meta Data Services
Packages node under Data Transformation Services folder of SQL Enterprise
Manager. This option allows you to keep track of package history, including
version number (providing access to previous versions, just as with Local
Packages) and lineage information. Lineage provides the means for keeping track
of changes that take place during package execution. In particular, you can
collect lineage information on a per-row or per-column level. Row-level lineage
keeps track of changes to individual rows as well as package execution details
(user name, server name, as well as date and time). Column-level lineage
maintains a listing of tables and their columns (for both source and
destination) referenced in each package. Such information is typically used for
auditing purposes. Both options are disabled by default. Enabling column-level
lineage is possible once the package is saved in the Repository; row-level
lineage requires that you create an additional column in the data source,
containing row identifier. As you can imagine, overhead associated with keeping
track of changes during package execution affects its speed. In fact, packages
stored in the Microsoft Repository are the slowest, when compared with other
storage types.
Unlike with the Local Packages option, it is not possible to provide individual
security settings for a Repository-resident package. -
structured storage files (with extension .DTS) are ideal for
copying packages between SQL servers. You can store any existing package (or a
number of packages) within a .DTS file and use traditional file transfer
methods (FTP, e-mail, or Windows copy and paste) to deliver it to a target SQL
server. Packages within structured storage files can be secured with user and
owner passwords. They can also be executed directly (without importing them
back into SQL Server) using the DTSRUN.EXE command line utility (DTSRUNUI.EXE
GUI applet provides a friendly way of configuring all DTSRUN parameters). -
Visual Basic module files (with extension .BAS) are intended for
Visual Basic developers, who want to be able to use programming methods of DTS
object model to modify existing packages. Note that such packages are not
intended to be re-imported into DTS Designer or Repository.
Packages can be run either interactively or as scheduled jobs. Interactive
execution can be triggered from the DTS Designer window (e.g. by selecting the
Execute option from the Package menu) or from the Command Prompt by running the
DTSRUN utility. You can also use for this purpose xp_cmdshell extended stored
procedure using the syntax:
master..xp_cmdshell 'DTSRUN /S "servername" /U "username" /P "password" /N "DTSPackageName"'
The Schedule Package option is available from the context
sensitive menu of packages listed under the Data Transformation Services node
in the Enterprise Manager. Alternatively, you can schedule DTSRUN command with
appropriate parameters using Windows Scheduled Tasks (available from the
Programs -> Accessories -> System Tools menu) or the DTSRUNUI utility.
When scheduling packages with Enterprise Manager or DTSRUNUI, make sure that
the SQL Server Agent service is configured properly and running. Finally, you
can also execute a package from within another package with Execute Package
task, which we will cover in one of our future articles (in this case, the way
the package runs will depend on the execution mode of its parent).
When scheduling packages, it is recommended to enable logging (configurable
from the Logging tab of the Package properties dialog box). This greatly
simplifies troubleshooting in case of a job failure (since error messages in
the job history are fairly generic). Logging provides status and progress
information, as well as error messages for each step. Logging can be recorded
into a text file or MSDB database tables (sysdtspackagelog and sysdtssteplog).
You should remember to periodically clear them (by either deleting files or
truncating tables), since new logging information is always appended to
existing logs.
The way a package is launched also affects its security context. During
interactive execution, a package operates using the same account as the
currently logged-on user (the one who launched it). The only exception is
running DTSRUN utility from within the xp_cmdshell extended stored procedure
(as illustrated above), in which case, the outcome depends on settings on the
Job System tab of the SQL Server Agent Properties dialog box (accessible from
the Management folder in the Enterprise Manager window). Using this tab, you
can either limit the ability to run xp_cmdshell to users with System Admininistrator
privileges, in which case xp_cmdshell executes using SQL Server Agent service
account or, otherwise, specify a proxy account that will be used instead.
Scheduled packages, on the other hand, operate always as the SQL Server Agent
service account, unless you use the DTSRUN utility with Windows Scheduled
Tasks, where you can arbitrarily assign user account information.
Note also that SQL Server Agent scheduled jobs run on the SQL Server, while
interactive ones execute on the same system on which they were launched. This
might have significant implications if a package includes such potentially data
and processor intensive tasks as ActiveX Scripts. In order to take advantage of
the processing power of the SQL Server system, either use scheduling or connect
to the server via Terminal Services. In any case, you should try avoiding
ActiveX Script tasks, as they are notorious for negative impact on package
execution speed.
In our first article of the series dedicated to the Data Transformation
Services, we provided an overview of the main concepts of this technology. This
will serve as the starting point to the discussion on DTS wizards, which will
be presented in the next article.