SQL Server 2005 Integration Services – Part 16

Data Flow Script Component

In our previous
article
of this series, we presented a quick introduction to Data Flow
Script Component of SQL Server 2005 Integration Services and followed with more
in-depth discussion about its debugging features. However, our coverage would
not be complete without spending some time on exploring its versatility, which
gives it the unique ability to function as a data source, destination, or
transformation. Our goal is to review these three types of configuration as
well as provide examples of their implementation.

As we mentioned before, you are prompted to choose one of the Script Component types as
soon as you add the Script Component to the Data Flow area of the SSIS Designer
interface (via Select Script Component Type dialog box). Your decision affects
settings displayed in the Script Transformation Editor dialog box (primarily,
the type of entries available in the Inputs and Outputs section and presence of
Input Columns). In addition, the choice has impact on automatically generated
Visual Basic .NET code, which you can view via the Script section (and
obviously determines the content of custom code you will create).

We will start our discussion with the description and an example of the
Source Script Component (which corresponds to the first option available in the
Select Script Component Type dialog box). As expected, this selection yields
configuration that does not require any inputs (Add Input command button is not
present on the Inputs and Outputs page) and, consequently, eliminates the need
to specify Input Columns (this section is missing from the Script Transformation
Editor). The remaining two sections – Connection Managers and Script – allow
you to specify the location of a data source and define a way to extract its
content (and, potentially, perform some processing tasks), respectively.
Customization of the component is carried out in two stages. During the first
one, called Metadata Design mode, you define connections, identify SSIS
variables to be available from within the code (by filling out the ReadOnlyVariables
and ReadWriteVariables entries in the Script section of the Script
Transformation Editor), as well as configure outputs and their columns –
including assigning a descriptive name and data type to each. The second stage
– known as the Code Design mode – involves programming (using Microsoft Visual
Studio for Applications), resulting in code responsible for retrieval and
manipulation of data to be passed to a next component in data path (in this
code, you reference connections and output columns using names you specified in
the first stage).

Source Script Component is useful in scenarios, where data source has a
difficult to parse format (other than standard ones, such as delimited,
tabular, fixed length, XML-based, or supported by OLE DB and .NET data
providers). A fairly common example involves text files containing a number of
multi-line records (equivalent to rows), with varying number of lines
(corresponding to columns) per each. In this article, we will present an
implementation of the Source Script Component, which processes such files. In
our case, we will generate the file using WMI Data Reader Task, by querying
Win32_Product Windows Management Instrumentation class, which contains the
listing of all Windows Installer applications on a target computer. Keep in
mind that in order to be able to execute this query on a Windows 2003 Server
system, you need to ensure that the WMI Windows Installer Provider is installed.
(Since this is one of the Windows components, this is accomplished from the
Management and Monitoring Tools dialog box, accesible via Add/Remove Windows
Components section of Add or Remove Programs Control Panel applet).

Launch SQL Server 2005 Business Intelligence Development Studio and create a
new Integration Services project. Place a WMI Data Reader Task on the Control
Flow area of the SSIS Designer (by dragging it from the Toolbox) and display its
Editor (by selecting the Edit… option from its context sensitive menu). Set
the WMI options according to the following listing:

  • WmiConnection – handled via WMI Connection Manager, pointing to
    the rootcimv2 namespace. Server name and authentication settings will depend
    on the system you target and credentials you use when connecting to it – but if
    you are running the package locally, you can simply rely on the defaults.

  • WmiQuerySourceType – Direct input

  • WqlQuerySource – SELECT * FROM Win32_Product

  • OutputType – Property name and value

  • OverwriteDestination – Overwrite destination

  • DestinationType – File connection

  • Destination – use the File Connection Manager (we will refer to
    it as WMIProduct throughout this article – but the choice of name is obviously
    arbitrary), pointing to a local file (e.g. C:tempWMIProduct.txt) with the
    "Create file" option as the "Usage type" parameter.

You can test the functionality of the task using the "Execute
Task" context-sensitive menu option to ensure that the WMIProduct.txt file
is populated with appropriate content. A single sample data record (there
should be one for every Windows Installer application installed on the target
computer) should have the following format:

Caption, Microsoft SQL Server 2005 Tools
Description, Microsoft SQL Server 2005 Tools
IdentifyingNumber, {12345678-ABCD-4321-ABCD-123456789ABC}
InstallDate, 20051115
InstallDate2, 20051115000000.000000-000
InstallLocation, 
InstallState, 5
Name, Microsoft SQL Server 2005 Tools
PackageCache, C:WINDOWSInstaller5c059.msi
SKUNumber, 
Vendor, Microsoft Corporation
Version, 9.00.1399.06

Next, add the Data Flow Task to the Control Flow tab of the SSIS Designer
interface and double click on it to activate the Data Flow area. Drag the
Script Component icon from the Toolbox onto it, pick the Source option from the
Select Script Component Type dialog box, and display the Script Transformation
Editor (by choosing the Edit… context sensitive menu entry) with three
entries on its left hand side, named Inputs and Outputs, Script, and Connection
Managers. Select the first one and rename Output 0 to ProductOutput (or
something else, more meaningful than the default). Expand the node, drilling
down to Output Columns level and click on the Add Column command button. Assign
a name Caption to the newly created column and ensure that its DataType
property is set to string[DT_STR]. While you could repeat the same process for the
remaining attributes of the Win32_Product class, based on the sample data
record above, in our example (for the sake of simplicity), we will limit the
selection to Name, Vendor, and Version only. Switch to the Connection Managers
section and click on the Add… command button. Choose the WMIProduct entry
from the drop-down list and specify a name via which you will reference this
connection in your code (we decided to use WMIProduct). Highlight the Script
entry on the left hand side and activate the Microsoft Visual Studio for
Applications with Design Script… command button.

At this point, the code window will contain only automatically generated
stub of the ScriptMain class, including the CreateNewOutputRows method (which
purpose is clearly defined by its name). While there are several other methods,
which can be used to further customize processing of the output (for their list
refer to BufferWrapper and ComponentWrapper modules in Project Explorer of
Visual Studio for Applications), we will deal exclusively with CreateNewOutputRows.
Note default references to Microsoft.SqlServer.Dts.Pipeline.Wrapper
and Microsoft.SqlServer.Dts.Runtime.Wrapper
namespaces (rather than Dts.Runtime,
used in the Script Task) in the Imports section. Since we intend to work with a
File Connection Manager, add to them the Imports
System.IO
statement (to simplify notation required to access
objects and methods in this namespace). Modify the remaining code by copying
the following:

Imports System
Imports System.Data
Imports System.Math
Imports System.IO
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain
    Inherits UserComponent
    Public Overrides Sub CreateNewOutputRows()
        Dim oFile As File
        Dim oStreamReader As StreamReader
        Dim sLine As String
        Try
            oStreamReader = oFile.OpenText(Me.Connections.WMIProduct.ConnectionString)
            sLine = oStreamReader.ReadLine()
            Do While sLine IsNot Nothing
                If sLine.Split(Chr(44))(0) = "Name" Then
                    ProductOutputBuffer.AddRow()
                    ProductOutputBuffer.Name = sLine.Substring(6)
                ElseIf sLine.Split(Chr(44))(0) = "Vendor" Then
                    ProductOutputBuffer.Vendor = sLine.Substring(8)
                ElseIf sLine.Split(Chr(44))(0) = "Version" Then
                    ProductOutputBuffer.Version = sLine.Substring(9)
                End If
                sLine = oStreamReader.ReadLine()
            Loop
            ProductOutputBuffer.SetEndOfRowset()
        Catch e As Exception
            Me.ComponentMetaData.FireError(1, "Script Component", e.Message, "", 0, True)
        Finally
            oStreamReader.Close()
        End Try
    End Sub
End Class

As part of the CreateNewOutputRows method, we read the text file identified
by the WMIProduct Connection Manager (line by line, in a Do While loop) until
we reach its end. For each line, we check whether its first part (up to the
first comma – which corresponds to the character 44 in ASCII – hence the Chr(44)
reference and use of the Split method) matches the string "Name",
"Vendor", or "Version". Whenever a new line containing the
"Name" entry is detected (assuming that this constitutes the beginning
of a new record), we invoke the AddRow() method. The row is populated with
name, version, and vendor values until the next line starting with the
"Name" string is encountered.

The use of Try… Catch… Finally structure ensures proper error handling
and is a standard programming practice. It also takes care of releasing all
resources involved in code execution, once the package completes (gracefully or
as the result of a failure). To verify results, you can redirect the output to
any of the destination components – the outcome should contain listing of
names, vendors, and version numbers for all applications installed on a target
computer.

»


See All Articles by Columnist
Marcin Policht

Marcin Policht
Marcin Policht
Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

Latest Articles