A database administrator can create ad-hoc packages to implement ETL via SSIS or use a method based on software programming principles. Ad-hoc packages may seem advantageous but can lead to duplication of effort. Learn how to implement the second of these options by taking advantage of package variables.
When implementing Extraction, Transformation, and Loading tasks
via SQL Server Integration Services, a database administrator generally has two options. One of
them involves creating ad-hoc packages without much concern for their structure
or future use (frequently relying on Import and Export Wizard) and the other is based
on software programming principles, with emphasis on modularity and
reusability. While the first approach may seem advantageous due to its
simplicity, it is bound to lead to duplication of effort in the long run and is
suited primarily for occasional use of SSIS. In this article, we will
demonstrate how to implement the second of these options by taking advantage of
package variables.
In our sample scenario, we will
step through implementation of a package, which exports a range of rows
(selected according to arbitrarily assigned criteria) from vEmployee view in
the AdventureWorks
database into a text file. Our intention is to show how to use variables to
replace hard-coded values that determine row selection or identify data source
and destination. This way, a request to change these values in the future can
be carried out by simply modifying respective variables, eliminating the need
to search for their references throughout the entire package. In our
demonstration, we will first create a package with specific values assigned
directly to relevant components. Afterwards, we will describe the process of
replacing them with variables.
Start by creating a new project
based on the Integration Services Project template in the Business
Intelligence Development Studio. This will form the required folder
structure within the location you specify, including a generic Package.dtsx file.
Switch to the Data Flow tab of the Designer window and follow the hyper-linked instructions appearing in its
center stating, "No Data Flow tasks have been added to this package. Click here to
add a new Data Flow task". With the new Data Flow Task in
place, drag and drop OLE DB Source from the Data Flow Sources section of the Toolbox onto
the Designer
interface, right-click on it and select Edit… from its context
sensitive menu. In the resulting OLE DB Source Editor, use
the New… command
button next to the OLE DB connection manager listbox to display the Configure OLE DB
Connection Manager window. Click on New… located in its lower portion
to activate the Connection Manager window. Type in the name of the server
hosting the source SQL Server instance in the Server name
textbox, assign appropriate authentication method in the Log on to the
server section, and specify AdventureWorks in the Connect to a
database section underneath (you can use the Test Connection command
button to verify that the resulting configuration is correct). Return to the OLE DB Source
Editor window, select Table or view entry in the Data access mode
listbox, followed by [HumanResources].[vEmployee] in the Name of the
table or the view listbox. Switch to the Columns tab and choose the
columns you intend to export (we will narrow down our selection to EmployeeID, FirstName, LastName, City, StateProvinceName, PostalCode, and CountryRegionName). Click
on the OK command button to return to the Designer window.
As we mentioned earlier, we
will limit rows copied to the target text file to those matching arbitrary
criteria. For the sake of simplicity, we will use the content of the City column
for this purpose, filtering out entries that are not equal to Seattle. To
accomplish this, drag Conditional Split from the Data Flow Transformations section
of the Toolbox, position it directly underneath OLE DB Source, and
connect the two via Data Flow Path (represented by an arrow extending from
the lower edge of the top component). Right click on the Conditional
Split Transformation and use the Edit… option from its context
sensitive menu to display the Conditional Split Transformation Editor window.
In its lower portion, type a new Output Name (we
will call it Local Employees, but the naming convention is purely
arbitrary and simply represents the intended use) and define the condition that
will determine whether an entry satisfies our criterion (which in this case
should be set to City == "Seattle"). As the result, all rows with
the City column
set to Seattle will be redirected to the newly created output. Confirm your
choices by clicking on the OK command button and return to the Designer
interface.
To complete implementation of
the first version of our package, drag Flat File Destination from
the Toolbox and
drop it directly underneath Conditional Split Transformation. Extend
the arrow representing the Local Employees Data Flow Path to link
them together. Use the Edit… option in the context sensitive menu of Flat File
Destination to launch its Editor window. Click on the New… command
button next to the Flat File connection manager listbox. You will be prompted
to choose the format of the destination file (among Delimited, Fixed width, Fixed width with
row delimiters, and Ragged right) first, followed by the file name and path (in the Flat File
Connection Manager window). Once you have provided all necessary information, click
on OK and
return to the Designer window.
The procedure we just completed
yields a fully functional package. You can test it by simply pressing F5, which
will compile and execute its code, generating a text file (whose name and
location you specified) containing data of AdventureWorks employees residing in
Seattle. However, if we wanted to produce equivalent output for Redmond
residents and store it in a different file (or reference another SQL Server
instance), we would need to modify the properties of corresponding package
components. This maintenance overhead can be to some extent mitigated by
employing variables to store values of component properties that are subject to
change.
In order to provide this extra
flexibility, we will use variables to configure both connection managers (OLE DB and Flat File) as
well as Conditional Split Transformation. In particular, srcServer and srcDatabase defined
on the Package level will provide parameters necessary to establish OLE DB Source-based
connection to vEmployee view. City scoped at the Data Flow Task level will serve as the
comparison criteria in the Conditional Split Transformation.
Package-wide dstFilePath will contain the file name and path of the Flat File
Destination. Each of them will be part of the default User
namespace (although it is certainly possible to define a custom namespace for
this purpose).
To create Package-level
variables, switch to the Control Flow tab of the Designer window, right click on any part of its unoccupied surface, and
choose Variables from the context sensitive menu. (Alternatively, select the Variables entry
in the SSIS top-level menu). In the resulting Variables window,
click on the left-most toolbar icon and enter relevant data in the Name, Data Type (String in our
case), and Value columns. To create component-level variables, select Data Flow Task on the Designer surface
(or switch to the Data Flow tab) and repeat the same sequence of steps described above.
Now, we need to replace
hard-coded values with references to the newly defined variables. In the case
of connection managers, this is accomplished by using their Expressions
property (appearing in their respective Properties windows, which you can
activate by pressing the F4 key or by selecting the Properties Window option
in the View top-level menu). In the Expressions entry,
click on the ellipsis (…) button on its right hand side to display Property
Expression Editor. Select ConnectionString in its Property column and use the
ellipsis button again (this time in the Expression column) to activate Expression
Builder. Note the Variables node in its upper left corner. Once you expand it, you should be
able to find all of the package level variables, including those you just
defined (which are easily identifiable by the User: prefix,
designating their namespace). You can define the desired expression by either
typing it directly in the Expression text box or by dragging its
individual elements (such as variables, functions, or operators) from the top
window panes. Assuming that you followed our naming convention, the expression
representing the database connection string takes the form (wrapped here for
better readability):
"Provider=SQLOLEDB;Data Source=" + @[User::srcServer] + ";Integrated Security=SSPI;" + "database=" + @[User::srcDatabase]
You would need to repeat the
same sequence of steps for the Flat File Connection Manager,
although in this case, the expression for its connection string should be set
to @[User::dstFilePath]. In the
case of Conditional Split Transformation, once you activate its
editor window, replace the existing entry in the Condition column
with UPPER(City)
== UPPER(@[User::City]). (We are leveraging the UPPER function in order to make our
comparison case-insensitive). As a result, our package remains fully
functional, but changing its behavior requires only modification of variable
values, rather than altering content or properties of its components.
Additional Resources
SQL Server 2008 R2 Index
MSDN Using Variables in Packages
MSDN Integration Services Variables