SQL Server 2005 – SQL Server Integration Services – Part 3

This is the third article of our series discussing SQL Server 2005
Integration Services (SSIS), which provides Extraction, Transformation, and
Loading features, replacing Data Transformation Services (DTS) available in SQL
Server 2000 and 7.0. We have already presented an overview of the basic
concepts necessary to design and implement SSIS-based projects. We have also stepped
through the creation of one such project using Business Intelligence
Development Studio. Our sample SSIS package, described in our previous article,
delivered the basic functionality we needed (running an external process and
loading the outcome of its execution stored temporarily in a text file into a
database), however, it had one major shortcoming – a lack of support for
reusability. If the external process, temporary file, or destination table in
the target database had to be changed, it would be necessary to directly modify
relevant components in the package. This is cumbersome as well as error prone,
and makes code maintenance or versioning difficult. In this article, we will
begin discussion of SSIS features, which simplify package maintenance and
increase their flexibility.

Before we start looking into these features, let’s briefly summarize the
ones covered so far, pointing out the differences between SSIS and its
predecessor – Data Transformation Services. The most significant one, from the
package design point of view, is separation of control flow and data flow, as
well as decoupling of the connection information from both. Control flow
defines the sequence of package processing, with additional logic dictated by the
outcome of execution of its individual components and configuration of precedence
constraints. Data flow determines movement (and most typically transformation)
of data between its source and destination. Every SSIS package consists of a
single control flow, which can contain any number of Data Flow units.
Connections, implemented via connection managers, represent various data stores
and can be used simultaneously by multiple control and data flow elements.
Variables provide runtime location for values utilized by SSIS components, such
as containers, tasks, event handlers and precedence constraints, as well as the
packages themselves. They facilitate the use of properties which might change
from one package execution to another, eliminate the need for using file system
or databases for temporary storage, serve as loop counters, parameters of
stored procedures, or parts of more complex expressions (used for example by
Derived Column Transformation).

In general, there are two types of variables – System and User (which occupy
System and User namespaces, respectively). System variables are predefined and
their list cannot be modified; user variables, on the other hand, can be
created on as needed basis and are modifiable. Both types are limited to the
scope of an SSIS component, which hosts their definition (the scope includes
automatically all of its subcomponents). This means that while System and User
variables on the package level are accessible from any of its components, the ones
that have been created within container, task, event handler, or precedence
constraint, are not visible outside of it. In cases where variable names
conflict, the variabale with more localized scope takes precedence. You can
minimize the possibility of such occurrences by introducing custom namespaces
(other than the System and User), but only for User variables.

The flexibility, which would allow you to change the properties of the
components described above, in order to accommodate changes in external
conditions, (e.g. when switching from a development to production environment
or when deploying the same package against multiple data stores) can be
accomplished though a number of different techniques. However, we will start
with the most straightforward technique, based on the package configurations.
Configurations are to some extent similar to the Dynamic
Properties Task
in SQL Server 2000 Data Transformation Services, since both
are capable of modifying package components using external data sources (such
as text files or environment variables). The same mechanism can be used to
assign values to package variables, further increasing the degree of
flexibility. Note, however, that unlike the Dynamic Properties Tasks, SSIS
configurations are applied prior to package invocation, so their dynamic
changes have no immediate impact (they affect only subsequent package
executions).

Configurations are implemented using Business Intelligence Development
Studio. With the Package Designer workplace open, select Configurations… item
from the DTS menu or from the context sensitive menu of the Control Flow area.
This will trigger the Package Configuration Organizer, where you can enable
package configurations by selecting the checkbox in its upper left corner.
Clicking on the Add… command button will launch the DTS Configuration Wizard,
guiding you through the creation of configurations. On the Select Configuration
Type page, you need to decide which configuration type to use (XML
Configuration File, Environment Variable, Registry Entry, or Parent Package
Variable). Your selection determines other parameters necessary to complete
this step (e.g. in case of XML configuration file, registry entry, or parent
package variables, you need to provide either their location and name directly,
or use an indirect approach and specify the name of an environment variable
where this information will be stored). File or registry entry is a convenient
choice in case the location remains the same across all package deployments,
otherwise, the use of environment variable would be a more appropriate option.
Storing configuration parameters of properties and variables outside of a
package allows you to modify them without the need to use Business Intelligence
Development Studio.

Once you choose which configuration type you intend to use, the DTS
Configuration Wizard presents you with a list of package components and their
manageable properties (its content is the same, regardless of your choice) on
the Select Objects to Export page. The list is extensive, covering all
Package-level Variables and Properties, its Executables (such as Execute
Process Task and Data Flow Task in our example) with their Variables and Properties,
Connections with their Properties – all provided in an easy to work with
graphical interface. To demonstrate this functionality, we will increase the flexibility
of our package by selecting the following (obviously, the selection could be
considerably larger, including other properties and variables):

  • Executable and Arguments properties of the Execute Process
    Task (residing under PackageExecutables node), which determine the
    external process to be executed and its arguments,
  • ConnectionString property of the Flat File Connection
    (residing under PackageConnections node), which determine the location of
    the file where input data resides,
  • ConnectionString, Initial Catalog, and ServerName
    properties of the OLE DB Connection (residing under PackageConnections
    node), which determine the location of the destination table where output
    data gets loaded.

After this step is completed, finish the wizard by assigning a descriptive
configuration name. You can review the results by analyzing the content of an
XML file, registry entry, or environment variable you designated as the
configuration store. A package can have multiple configurations, which are
loaded in the order listed in the Package Configuration Organizer dialog box
(i.e. entries lower in the list take precedence over the ones above them – in
case they contain conflicting entries). In order to change the configuration,
you can edit its content (using appropriate editor, depending on storage type)
and alter values of properties or variables according to your preferences. For
example, if you need to execute the same package in development, test, and
production environments, you can create three XML-based configuration files,
each containing different values for server, database, and security settings,
and copy each to the same local file system location on development, test, and
production computers. This way, your package will execute in the manner
matching its intended environment, without having any changes applied to it. In
addition, the same configuration file can be shared across multiple packages,
further simplifying their maintenance.

Another alternative providing the desired flexibility involves package
execution using the DTExec command line utility with appropriate switches or using
its graphical interface equivalent DTExecUI (both executables reside, by
default, in the Program FilesMicrosoft SQL Server90DTSBinn folder). DTExec
is a replacement for the Data Transformation Services-based DTSRun, providing the
ability to execute packages directly from the Command Prompt interface. To
access packages stored in the file system, you need to use the /FILE switch
(/SQL option is required when dealing with packages residing in SQL Server
databases). The /SET option is intended for assigning values to properties of
package components and variables at the time of package invocation. Both
properties and variables are identifiable by their paths, formed according to
specific syntax rules. In particular, the notation starts with the Package
prefix followed by additional elements, pointing either to the target variable
or property to be set. The backslash character (”) is a container separator,
period (‘.’) delimits properties, and members of collections (such as
Variables) can be determined by their index, enclosed in square brackets ([]).
Finally, the semicolon is placed between the property path and its value. For
example, in order to assign the value of "tempdb" to the InitialCatalog
property of our OLE DB Connection to the target database, you would need to
type the following on the command line (assuming that our package has been
saved in the file called SamplePackage.dtsx in the current directory):

DTExec /FILE SamplePackage.dtsx /SET Package.Connections[ConnectionID].InitialCatalog;tempdb

The ConnectionID represents the GUID value identifying the OLE DB
Connection component from the moment of its creation (you can find this value
by checking the content of the Properties Window of the connection). When
dealing with variables, the syntax of the path parameter will depend on their
scope. For example, if your intention were to assign the value "Thievery
Corporation" to a User variable called "Group" defined on the
package level and residing in the file Electronica.dtsx, you would run the
following from the Command Prompt:

DTExec /FILE Electronica.dtsx /SET Package.Variables[Group].Value;"Thievery Corporation"

When setting a variable on a component level, the path would need to be
modified in order to reflect its location within the package hierarchy. For
example, assuming that a User type variable named Album, which existed within
the Execute Process Task called RunMe in the same package, should be set to "The
Mirror Conspiracy," you would be able to accomplish it by altering the
path property and value as below:

DTExec /FILE Electronica.dtsx /SET PackageRunMe.Variables[Album].Value;"The Mirror Conspiracy"

In our next article, we will discuss in more detail the use of variables as
well as present another method of modifying package properties through Property
Expressions.

»


See All Articles by Columnist
Marcin Policht

Marcin Policht
Marcin Policht
Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles