Data Flow Script Component
In our previous
article of this series, we presented a quick introduction to Data Flow
Script Component of SQL Server 2005 Integration Services and followed with more
in-depth discussion about its debugging features. However, our coverage would
not be complete without spending some time on exploring its versatility, which
gives it the unique ability to function as a data source, destination, or
transformation. Our goal is to review these three types of configuration as
well as provide examples of their implementation.
As we mentioned before, you are prompted to choose one of the Script Component types as
soon as you add the Script Component to the Data Flow area of the SSIS Designer
interface (via Select Script Component Type dialog box). Your decision affects
settings displayed in the Script Transformation Editor dialog box (primarily,
the type of entries available in the Inputs and Outputs section and presence of
Input Columns). In addition, the choice has impact on automatically generated
Visual Basic .NET code, which you can view via the Script section (and
obviously determines the content of custom code you will create).
We will start our discussion with the description and an example of the
Source Script Component (which corresponds to the first option available in the
Select Script Component Type dialog box). As expected, this selection yields
configuration that does not require any inputs (Add Input command button is not
present on the Inputs and Outputs page) and, consequently, eliminates the need
to specify Input Columns (this section is missing from the Script Transformation
Editor). The remaining two sections – Connection Managers and Script – allow
you to specify the location of a data source and define a way to extract its
content (and, potentially, perform some processing tasks), respectively.
Customization of the component is carried out in two stages. During the first
one, called Metadata Design mode, you define connections, identify SSIS
variables to be available from within the code (by filling out the ReadOnlyVariables
and ReadWriteVariables entries in the Script section of the Script
Transformation Editor), as well as configure outputs and their columns –
including assigning a descriptive name and data type to each. The second stage
– known as the Code Design mode – involves programming (using Microsoft Visual
Studio for Applications), resulting in code responsible for retrieval and
manipulation of data to be passed to a next component in data path (in this
code, you reference connections and output columns using names you specified in
the first stage).
Source Script Component is useful in scenarios, where data source has a
difficult to parse format (other than standard ones, such as delimited,
tabular, fixed length, XML-based, or supported by OLE DB and .NET data
providers). A fairly common example involves text files containing a number of
multi-line records (equivalent to rows), with varying number of lines
(corresponding to columns) per each. In this article, we will present an
implementation of the Source Script Component, which processes such files. In
our case, we will generate the file using WMI Data Reader Task, by querying
Win32_Product Windows Management Instrumentation class, which contains the
listing of all Windows Installer applications on a target computer. Keep in
mind that in order to be able to execute this query on a Windows 2003 Server
system, you need to ensure that the WMI Windows Installer Provider is installed.
(Since this is one of the Windows components, this is accomplished from the
Management and Monitoring Tools dialog box, accesible via Add/Remove Windows
Components section of Add or Remove Programs Control Panel applet).
Launch SQL Server 2005 Business Intelligence Development Studio and create a
new Integration Services project. Place a WMI Data Reader Task on the Control
Flow area of the SSIS Designer (by dragging it from the Toolbox) and display its
Editor (by selecting the Edit… option from its context sensitive menu). Set
the WMI options according to the following listing:
-
WmiConnection – handled via WMI Connection Manager, pointing to
the rootcimv2 namespace. Server name and authentication settings will depend
on the system you target and credentials you use when connecting to it – but if
you are running the package locally, you can simply rely on the defaults. - WmiQuerySourceType – Direct input
- WqlQuerySource – SELECT * FROM Win32_Product
- OutputType – Property name and value
- OverwriteDestination – Overwrite destination
- DestinationType – File connection
-
Destination – use the File Connection Manager (we will refer to
it as WMIProduct throughout this article – but the choice of name is obviously
arbitrary), pointing to a local file (e.g. C:tempWMIProduct.txt) with the
"Create file" option as the "Usage type" parameter.
You can test the functionality of the task using the "Execute
Task" context-sensitive menu option to ensure that the WMIProduct.txt file
is populated with appropriate content. A single sample data record (there
should be one for every Windows Installer application installed on the target
computer) should have the following format:
Caption, Microsoft SQL Server 2005 Tools Description, Microsoft SQL Server 2005 Tools IdentifyingNumber, {12345678-ABCD-4321-ABCD-123456789ABC} InstallDate, 20051115 InstallDate2, 20051115000000.000000-000 InstallLocation, InstallState, 5 Name, Microsoft SQL Server 2005 Tools PackageCache, C:WINDOWSInstaller5c059.msi SKUNumber, Vendor, Microsoft Corporation Version, 9.00.1399.06
Next, add the Data Flow Task to the Control Flow tab of the SSIS Designer
interface and double click on it to activate the Data Flow area. Drag the
Script Component icon from the Toolbox onto it, pick the Source option from the
Select Script Component Type dialog box, and display the Script Transformation
Editor (by choosing the Edit… context sensitive menu entry) with three
entries on its left hand side, named Inputs and Outputs, Script, and Connection
Managers. Select the first one and rename Output 0 to ProductOutput (or
something else, more meaningful than the default). Expand the node, drilling
down to Output Columns level and click on the Add Column command button. Assign
a name Caption to the newly created column and ensure that its DataType
property is set to string[DT_STR]. While you could repeat the same process for the
remaining attributes of the Win32_Product class, based on the sample data
record above, in our example (for the sake of simplicity), we will limit the
selection to Name, Vendor, and Version only. Switch to the Connection Managers
section and click on the Add… command button. Choose the WMIProduct entry
from the drop-down list and specify a name via which you will reference this
connection in your code (we decided to use WMIProduct). Highlight the Script
entry on the left hand side and activate the Microsoft Visual Studio for
Applications with Design Script… command button.
At this point, the code window will contain only automatically generated
stub of the ScriptMain class, including the CreateNewOutputRows method (which
purpose is clearly defined by its name). While there are several other methods,
which can be used to further customize processing of the output (for their list
refer to BufferWrapper and ComponentWrapper modules in Project Explorer of
Visual Studio for Applications), we will deal exclusively with CreateNewOutputRows.
Note default references to Microsoft.SqlServer.Dts.Pipeline.Wrapper
and Microsoft.SqlServer.Dts.Runtime.Wrapper
namespaces (rather than Dts.Runtime
,
used in the Script Task) in the Imports section. Since we intend to work with a
File Connection Manager, add to them the Imports
statement (to simplify notation required to access
System.IO
objects and methods in this namespace). Modify the remaining code by copying
the following:
Imports System Imports System.Data Imports System.Math Imports System.IO Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper Imports Microsoft.SqlServer.Dts.Runtime.Wrapper Public Class ScriptMain Inherits UserComponent Public Overrides Sub CreateNewOutputRows() Dim oFile As File Dim oStreamReader As StreamReader Dim sLine As String Try oStreamReader = oFile.OpenText(Me.Connections.WMIProduct.ConnectionString) sLine = oStreamReader.ReadLine() Do While sLine IsNot Nothing If sLine.Split(Chr(44))(0) = "Name" Then ProductOutputBuffer.AddRow() ProductOutputBuffer.Name = sLine.Substring(6) ElseIf sLine.Split(Chr(44))(0) = "Vendor" Then ProductOutputBuffer.Vendor = sLine.Substring(8) ElseIf sLine.Split(Chr(44))(0) = "Version" Then ProductOutputBuffer.Version = sLine.Substring(9) End If sLine = oStreamReader.ReadLine() Loop ProductOutputBuffer.SetEndOfRowset() Catch e As Exception Me.ComponentMetaData.FireError(1, "Script Component", e.Message, "", 0, True) Finally oStreamReader.Close() End Try End Sub End Class
As part of the CreateNewOutputRows method, we read the text file identified
by the WMIProduct Connection Manager (line by line, in a Do While loop) until
we reach its end. For each line, we check whether its first part (up to the
first comma – which corresponds to the character 44 in ASCII – hence the Chr(44)
reference and use of the Split method) matches the string "Name",
"Vendor", or "Version". Whenever a new line containing the
"Name" entry is detected (assuming that this constitutes the beginning
of a new record), we invoke the AddRow() method. The row is populated with
name, version, and vendor values until the next line starting with the
"Name" string is encountered.
The use of Try… Catch… Finally structure ensures proper error handling
and is a standard programming practice. It also takes care of releasing all
resources involved in code execution, once the package completes (gracefully or
as the result of a failure). To verify results, you can redirect the output to
any of the destination components – the outcome should contain listing of
names, vendors, and version numbers for all applications installed on a target
computer.