SQL Server 2005 Integration Services - Part 17
January 23, 2006
Script Transformation Component
Continuing the topic of the Script Component in SQL Server 2005 Integration Services, to which we have dedicated our previous two articles (covering its debugging capabilities and its role as a Data Flow Source), we will now take a closer look its usefulness as a Data Flow Transformation. This functionality comes in handy, even though Business Intelligence Development Studio offers a significant number of built-in components, which deliver a variety of data manipulation features, ranging from combining data from several sources (such as Merge, Merge Join, or Union All), reversing this process by splitting it across multiple outputs (such as Multicast or Conditional Split), performing lookups (standard, Term, or Fuzzy), through operations geared towards Business Intelligence (such as Data Mining Query or Term Extraction), to a variety of conversions, aggregates, and sorts. In general, these transformations can be categorized as either synchronous (such as character mappings or data conversions), for which modification are performed "inline" (i.e. each individual input row is processed as soon as it is available and produces typically one output row) or asynchronous (such as aggregates or sorts) where generation of an output requires the availability of all input rows and where numbers of input and output rows more commonly differ. Both of these categories can be implemented using the Script Transformation Component.
Let's start with the synchronous type. Launch the SQL Server 2005 Business Intelligence Studio and create a new Integration Services project. Add a Data Flow Task to the Control Flow area of the Designer interface and double click on it to activate it. Drag the Script Component icon from the Toolbox and drop it on the Data Flow area. At this point, you will be presented with the Select Script Component Type dialog box. Choose the Transformation option and click on OK to complete the step of adding the component to your package.
As mentioned in one of our previous articles, Script Component, designated as a transformation, supports one input and multiple outputs (configurable from the Inputs and Outputs page of the Script Transformation Editor window). For this reason, you also have the ability to specify input columns (from the Input Columns page) to be available from within your custom code (in read only or read/write manner, depending on the content of Usage Type entry that you set individually for each input column). In our case, we will use as the input [HumanResources].[Employee] table from the AdventureWorks database and its EmployeeID, LoginID, HireDate, SalariedFlag, and VacationHours as the input columns. For all salaried employees (who are identified by the True value of SalariedFlag field), we will increase the value of VacationHours by 10 (obviously there are numerous ways of producing the same result - but this simple example will serve the purpose of demonstrating the functionality of synchronous transformation). To accomplish this, add the OLE DB Source icon from the Toolbox to the Data Flow tab of the Designer interface and select Edit from its context sensitive menu. In the Connection Manager area of OLE DB Source Editor, click on New... command button to create OLE DB Connection Manager, pointing to the AdventureWorks database on your target server and supply the appropriate authentication information. Ensure that "Table or view" entry appears in the Data access mode drop down list and select [HumanResources].[Employee] as the table name. Switch to the Columns page within the OLE DB Source Editor and select EmployeeID, LoginID, HireDate, SalariedFlag, and VacationHours from the list of Available External Columns. Once you have confirmed your selection and returned to the Data Flow area, connect the output of the OLE DB Data Source (green arrow) to the input of the Script Component.
Use the Edit option from the context sensitive menu of the Script Component, to display the Script Transformation Editor. In the Input Columns section, select all of the Available Input Columns and set the Usage Type of the VacationHours to ReadWrite. After switching to the Inputs and Outputs section, you will notice that there already exists one input, called Input 0, populated according to the column selections you have made so far. At the same rate, there is also one pre-created output, called Output 0, although without predefined output columns. Keep in mind that it is not required to specify them as long as the SynchronousInputID property of the output matches the ID of the input (which is the default configuration). You should consider assigning more meaningful names to the input and output (such as, EmployeeInput and EmployeeOutput), although keep in mind that this will be reflected by the naming convention used in the component's auto-generated Visual Basic .NET code. To view it, switch to the Script section and click on the Design Script... command button. In the Microsoft Visual Studio for Applications interface, you will find ScriptMain class with a stub of ProcessInputRow sub, which gets invoked for every input row of data. In our case, we simply want to add 10 hours to the number of VacationHours for every salaried employee. This can be done with the following code:
Public Class ScriptMain Inherits UserComponent Public Overrides Sub EmployeeInput_ProcessInputRow(ByVal Row As EmployeeInputBuffer) If Row.SalariedFlag = True Then Row.VacationHours = Row.VacationHours + CType(10, Short) End If End Sub End Class
We will store the results in a text file using Flat File Destination. Drag its icon from the Toolbox onto the Data Flow area of the Designer interface and connect the output of the Script Component to it. Select Edit from its context sensitive menu to display the Flat File Connection Editor and create a Flat File Connection Manager (by clicking on New... command button on the Connection Manager page). Pick the preferred Flat File format, provide the location of the target file along with the format options in the Flat File Connection Manager Editor dialog box, and click on the OK button to return to the Flat File Destination Editor. On the Mappings page, ensure that the Available Input Columns properly match the Available Destination Columns. Executing the package should produce the desired results, generating output rows with the value of VacationHours increased by 10, for every entry with True value in the SalariedFlag column (which you could review by checking content of the target text file).
Creating asynchronous transformation with the Script Component is in many aspects similar to the process we just described. The most important differences involve the logic applied to generating output columns (as we mentioned before, asynchronous processing deals with multiple input rows) as well as the implications of the fact that in this case you need to directly manipulate the content of the output buffer. (As you have seen, this is handled transparently when using synchronous approach, where incoming columns are automatically copied from the input to ouput buffer - providing that the value of SynchoronousInputID property is correctly set). We also need some type of mechanism to detect when the end of input rowset is reached.
To demonstrate this, we will alter our existing package by adding another output (with a single column EmpCount) yielding a total number of rows with the SalariedFlag set to True. Producing this aggregate can be accomplished by creating a Visual Basic .NET integer variable (we will call it simply iEmpCount) that will be incremented every time an input row with the appropriate value in the designated column is encountered. In addition, we need to modify the Script Component code to take care of populating the new output (this is done using custom CreateNewOutputRows sub, which calls AddRow() method and ProcessInput sub which sets the output column to the final value of iEmpCount once the last input row is processed) and set SynchronousInputID for our new output to 0.
Start by displaying the Script Transformation Editor (by selecting Edit entry from the context-sensitive menu of the Script Component). In the Inputs and Outputs section, click on the Add Output button. Assign a descriptive name to the new output (e.g. EmployeeSumOutput) and set its SynchronousInputID to 0. Choose its Output Columns node and use the Add Column command button to create a single column (we will call it EmpCount). Switch to the Script section of the Editor, click on the Design Script... command button, and modify the code so it looks as follows:
Public Class ScriptMain Inherits UserComponent Dim iEmpCount As Integer Public Overrides Sub CreateNewOutputRows() EmployeeSumOutputBuffer.AddRow() End Sub Public Overrides Sub EmployeeInput_ProcessInput(ByVal Buffer As EmployeeInputBuffer) While Buffer.NextRow() EmployeeInput_ProcessInputRow(Buffer) End While If Buffer.EndOfRowset Then EmployeeSumOutputBuffer.EmpCount = iEmpCount EmployeeSumOutputBuffer.SetEndOfRowset() End If End Sub Public Overrides Sub EmployeeInput_ProcessInputRow(ByVal Row As EmployeeInputBuffer) If Row.SalariedFlag = True Then Row.VacationHours = Row.VacationHours + CType(10, Short) iEmpCount = iEmpCount + 1 End If End Sub End Class
Finally, create a new Flat File Connection Manager (by specifying a target file name and the single column to store EmpCount value) and a corresponding Flat File Destination. Once you complete execution of the package, the total number of rows with the SalariedFlag column set to True will be stored in the designated file.