Exploring Derived Column Transformation in SQL Server Integration Services
May 24, 2010
Marcin Policht examines SQL Server Integration Services' component, Derived Column Transformation,and how its usefulness is enhanced by its ability to implement fairly elaborate mathematical, logical, and string operations.
As we have explained in our most recent article, Derived Column Transformation can be leveraged in order to resolve data type conversion issues that tend to surface when importing the content of Excel spreadsheets into SQL Server database tables. While that clearly demonstrated one of the important benefits of this Integration Services component, it merely hinted at the breadth of its full potential. Now we will try to give it more appropriate exposure by extending the scope of our discussion and presenting a number of other scenarios where its capabilities are equally useful.
In our most recent article, we have described how Derived Column Transformation can be leveraged in order to resolve data type conversion issues that tend to surface when importing the content of Excel spreadsheets into SQL Server database tables. While that clearly demonstrated one of the important benefits of this Integration Services component, it merely hinted at the breadth of its full potentials. Now we will try to give it more appropriate exposure by extending the scope of our discussion and presenting a number of other scenarios where its capabilities are equally useful.
Before we look more closely into possible applications of Derived Column Transformation, let's first review its most relevant characteristics. As we already know, its primary purpose is modifying both metadata and values of input columns by applying string, date and time, mathematical and logical functions or operators, as well as type casts, including conversion of NULLs. (Any of these calculations might involve the use of package variables, which we will be covering in more detail in one of our upcoming articles). An outcome is available (in synchronous manner) for further processing or final loading via one or more output columns. Depending on the requirements and types of changes, results can either replace the content of existing columns or populate new ones (we used the latter approach when performing data type conversion described in our previous article).
In general, you can control the behavior of an arbitrary SSIS data flow component in several ways. The most readily available approach utilizes its Properties window, which in the case of Derived Column Transformation gives you an option to configure the following attributes:
However, you will likely be spending most of your time working with settings exposed via Editor and Advanced Editor interfaces. Somewhat counterintuitively, the latter facilitates access to properties using generic layout and is available for the majority of Data Flow components (with the exception of multi-input transformations), while the former is implemented via a custom dialog box, geared towards its specific role. On the other hand (as expected), Advanced Editor gives you insight into more detailed configuration options.
While previously we focused on type casting functionality of Derived Column Transformation, you will find that its usefulness is further enhanced by its ability to implement fairly elaborate mathematical, logical, and string operations. In particular, one of its more commonly utilized features is the conditional operator, taking the form of expression1 ? expression2 : expression3 (which translates into IF expression1 THEN expression2 ELSE expression3 pseudocode construct). It's rather obvious purpose is to determine the Boolean value of the expression1 and, depending on its outcome (which evaluates to either TRUE or FALSE), yield either expression2 or expression3.
For example, let's consider a rather straightforward scenario with a fictitious oil company that wants to determine its strategy in regard to extending offshore exploration in different geographical areas depending on current environmental damage in each. This can be accomplished by creating a Derived Column Transformation that includes one input column (we will call it Spilled[gallons]), which designates the number of gallons of oil spilled by rigs or tankers and a new derived column (which, in our case, we will be automatically recognized as [DT_WSTR], although you can use explicit type casting to implement an arbitrary conversion), outlining the corresponding recommended action. The formula used in this case could take the form Spilled[gallons] < 10,000,000 ? "pay off" : "repudiate/lobby", which would need to be entered in the Expression field of the Advanced Editor window. In order to account for situations in which the value of Spilled[gallons] is NULL, the formula could be changed to (note that nesting of conditional expressions is emphasized by enclosing the inner one into a set of parenthesis, which improves readibility).
ISNULL(Spilled[gallons]) ? "drill baby drill" : (Spilled[gallons] < 10,000,000 ? "pay off" : "repudiate/lobby")
Another common situation involving Derived Column Transformation deals with string manipulation. Let's assume that our input contains four columns, named Salutation, FirstName, MiddleName, and LastName. If we want to create their concatenated listing consisting of the salutation, followed by first name, middle name initial, and last name (all capitalized), this could be accomplished by employing the following expression (to eliminate the possibility of leading or trailing spaces, you could also apply TRIM function):
(ISNULL(Salutation) ? "" : UPPER(Salutation) + " ") + (ISNULL(FirstName) ? "" : UPPER(FirstName) + " ") + (ISNULL(MiddleName) ? "" :
Similarly, it is possible to parse character-based data in order to extract an arbitrarily chosen substring. For example, let's say you want to reverse the process we just described and identify the last name of each person on the concatenated list. In this case, our input column would take the form of a sequence of capitalized entries representing salutation, first name, middle name (all three being optional), followed by (mandatory) last name. In order to accomplish our goal, we would need to determine the position of the last blank space and capture all characters to its right. If the blank space cannot be found, then, we can safely conclude that the entire field contains the last name. To provide the last name in the original format, we would convert all characters except for the first one into lower case. Assuming that our input column is called FullName, the corresponding expression would take the form (note that in this case, for the sake of simplicity, we do not account for the possibility of having NULL entries):
FINDSTRING(REVERSE(FullName), " ", 1) == 0 ? UPPER(SUBSTRING(FullName, 1, 1)) + LOWER(SUBSTRING(FullName, 2, LEN(FullName))) :
Another data type that frequently becomes a subject of Derived Column Transformation-based operations is date and time. To a large extent, this is due to the availability of built-in functions that considerably simplify such operations as identifying the year, month, day, or day of a week corresponding to the content of a datetime column. More specifically, this can be accomplished by taking advantage of the YEAR(DateColumnName), MONTH(DateColumnName), DAY(DateColumnName), and DATEPART("dw", DateColumnName) functions (although keep in mind that, depending on a data source and destination, you might need to apply additional type-cast operators in order to comply with explicit SSIS conversion rules). If you want to further explore features incorporated into Derived Column Transformation, refer to Functions (SSIS Expressions) and Advanced Integration Services Expressions articles of the SQL Server 2008 Books Online. Despite the flexibility and wide range of capabilities offered by Derived Column Transformation, it is important to note that when dealing with complex operations, it is typically more efficient and less error prone to use the Script Component instead. As you probably noticed by reading through our examples above, the syntax of SSIS expressions is fairly difficult to analyze and troubleshoot (especially when viewing them directly in the Advanced Editor interface, where expressions appear in a single-line, scrollable textbox).