SQL Server 2005 Integration Services - Part 37 - Derived Column Transformation

November 27, 2006

Continuing our coverage of SQL Server 2005 Integration Services Data Flow components, we will present Derived Column transformation, which, while serving primarily an auxiliary role, should not be ignored, since it offers considerable flexibility and versatility combined with extreme ease of use. Its functionality is based on its ability to generate output data rows through custom-built expressions (constructed with a wide range of built-in functions and operators), which, besides referencing input columns might also leverage values stored in package variables. Depending on your preferences, each expression can be used either to replace data in an input column (as it is being passed to the output) or to form an extra output column (which requires that you designate a unique name, data type, and its related characteristics, such as length, precision, scale, and, for strings, also correct code page). These changes are typically made via its Editor interface, which offers easy access to variables and columns (displayed in the upper left section of its window), functions and operators (located in the listing on the right hand side and consisting of such items as Mathematical, String, Date/Time, and NULL functions, Type Casts, as well as Operators), and (in the area at the bottom) table-formatted structure showing derived columns with each of their properties.

In order to examine these configuration options more closely, create a new project of Integration Services type in the Business Intelligence Development Studio. Once you are presented with the Control Flow tab SSIS Designer, add a new Data Flow task to it and double click on it (which will allow you to edit it directly). Drop OLE DB Source onto it (after dragging it from the Toolbox) and display its Editor by selecting the appropriate option from its context sensitive menu. In the Editor window, create a new OLE DB connection manager, pointing to the AdventureWorks database on the server hosting your SQL Server 2005 installation, and specify a valid authentication option. In the Data access mode listbox, select the "Table or view" option and locate [Sales].[vSalesPerson] next to the "Name of the table or the view" entry below. Ensure that all of its columns are included (by checking the content of the Columns section) and close the Editor window by clicking on the OK button.

Next, add Derived Column transformation (by dragging its icon from the Toolbox onto the Data Flow tab area) and connect its input with the output of the OLE DB Source. Right click on it and select the Edit option to display its Editor window. Review its layout, noting the listing of variables and columns, as well as the available functions and operators. Each of the input columns can be simply copied to the output (use the Input Columns tab in the Advanced Editor window in order to designate them) or replaced with values obtained by applying expressions specified in the Expression column within the Editor window. These expressions can also be used to produce the content of new columns (with arbitrary names assigned through either of the two Editor windows). To capture outcome of the transformation, connect its output to a suitable Data Flow destination.

Despite its straightforward interface, Derived Column delivers quite a diverse and powerful set of features, which is best explored by testing its ability to process different types of data. The rich selection of mathematical functions and operators allow you to perform a variety of calculations and logical manipulations (when using them, keep in mind their precedence and associativity rules, which are described in detail in the Books Online). Among the most useful ones is the conditional operator, which takes the form expression1 ? expression2 : expression3 (equivalent to the IF expression1 THEN expression2 ELSE expression3 construct common to practically all programming languages). The operator evaluates the Boolean value of the expression1 and based on its outcome (True or False) returns a result of expression2 or expression3 (respectively). For example, let's assume that we want to motivate salespeople, based on their sales figures, by comparing content SalesYTD and SalesQuota fields for each. This can be done by adding a new derived column (call it Motivation) and entering the following formula in the Expression field:

SalesYTD > SalesQuota ? "Carrot" : "Stick"

and ensuring that the designated Data Type matches both expression2 and expression3 (in this case, string[DT_STR] of length 6 and 1252 Code Page will suffice). If you try to execute our sample package, you will notice that the results are not complete, due to the fact that some of the SalesQuota entries are NULL. To resolve this issue, we will modify expression for the derived column to account for these exceptions (length of the data type will be automatically adjusted to accommodate the change):

ISNULL(SalesQuota) ? "Electrode" : (SalesYTD > SalesQuota ? "Carrot" : "Stick")

Another common scenario where Derived Column transformation provides a convenient, easy to implement solution, involves concatenating entries from several columns (such as putting together a full address by combining its individual components). As before, you can verify whether each field contains a value by applying the ISNULL function to it and, depending on the outcome, substitute it with an empty string or include its content in the final result. In our sample package, with input columns named AddressLine1, AddressLine2, City, PostalCode, StateProvinceName, and CountryRegionName, the relevant expression takes the form (note that we are adding an extra space to each entry, except for the last one, in order to provide separation between them):

(ISNULL(AddressLine1) ? "" : AddressLine1 + " ") + (ISNULL(AddressLine2) ? "" : AddressLine2 + " ") + 
 (ISNULL(City) ? "" : City + " ") + (ISNULL(PostalCode) ? "" : PostalCode + " ") + 
 (ISNULL(StateProvinceName) ? "" : StateProvinceName + " ") + (ISNULL(CountryRegionName) ? "" : CountryRegionName)

The ability to process null values gracefully (by replacing them with arbitrary defaults) is helpful in implementations that involve Lookup Transformations (for more information on this subject, refer to our earlier article of this series). Entries without data that fail lookup can be simply passed through to the output and replaced afterwards with appropriate values by a Derived Column transformation, located downstream in the Data Flow.

The majority of less complex string operations can be performed by directly applying functions built into the Derived Column feature set. For example, to trim leading or trailing blank spaces, simply execute LTRIM or RTRIM (respectively). Changing case is easily done with UPPER or LOWER. RIGHT allows you to retrieve the trailing portion of a character expression of a specified length (to extract the leading string of desired length use the SUBSTRING function, setting its first parameter to 1 and the second one to the required number of characters). You can also handle more involved string manipulations, such as parsing, leveraging FINDSTRING (which finds the starting position of one string within another) or the just mentioned SUBSTRING (extracting designated portion of a string). With more complex cases, it might be more sensible to use the Script Component, since editing lengthy Derived Column expressions can be rather cumbersome.

Derived Column is also useful for the purpose of data conversion (mainly by employing type casting operations). If you expect any records in your input to contain values for which conversion will fail, you can address this issue by directing Error Output into another Derived Column transformation, which will replace the "illegal" entries with some valid default (choosing an appropriate default would depend on the desired data type). The output of the second transform could then be combined (using Union All Data Flow component) with the output of the first to yield a complete data set.

Similarly, with the help of expressions available within Derived Column transformation, it is fairly straightforward to convert between data formats. For example, the DATEPART function (essentially identical to its T-SQL counterpart) provides the ability to extract an integer representing a portion of a specific date. Alternatively, this can be accomplished with DAY, MONTH, and YEAR functions. This way, converting the current date from the default US format of MMDDYYYY to YYYY/MM/DD is possible with either of these two statements:

(DT_STR,4,1252)DATEPART("yyyy",GETDATE()) + "/" + (DT_STR,2,1252)DATEPART("mm",GETDATE()) + 
 "/" + (DT_STR,2,1252)DATEPART("dd",GETDATE())
(DT_STR,4,1252)(YEAR(GETDATE())) + "/" + (DT_STR,2,1252)(MONTH(GETDATE())) + 
 "/" + (DT_STR,2,1252)(DAY(GETDATE())) + "/"

As you can see, Derived Column transformations offer significant flexibility in the areas of logical and math operations, string processing, and data conversion. I hope that this article, intended as a brief overview of some of its capabilities will help you leverage its usefulness in other scenarios.

» See All Articles by Columnist Marcin Policht

The Network for Technology Professionals


About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers