SQL Server Integration Services 2008: Importing Excel Data Using Derived Column Transformation

May 10, 2010

The complexity involved in transferring data between Excel and SQL Server results from different and sometimes incompatible data types. The Import and Export wizard mitigates potential issues introduced by these incompatibilities by taking advantage of Data Conversion Transformation. Marcin Policht describes another approach that produces an equivalent outcome by employing Derived Column Transformation instead.

In our most recent articles, we have explained how SQL Server 2008 Integration Services Import and Export Wizard facilitate the transfer of data between an Excel spreadsheet and a table residing in a SQL Server database. The complexity involved in performing this seemingly straightforward task results from different and sometimes incompatible data types that are used by these two data stores. As we have demonstrated, the wizard mitigates potential issues introduced by such incompatibilities by taking advantage of Data Conversion Transformation (which is automatically added to the Data Flow task of the resulting SSIS package). However, another approach produces an equivalent outcome by employing Derived Column Transformation instead. This article describes its characteristics and demonstrates its usage.

Derived Column Transformation is one of the more popular components of Data Flow Task. Despite its simplicity (as far as the ease of use is concerned), it is quite powerful and flexible, primarily due to the SSIS expression language, which is exposed through its interface. In order to become familiar with its basic functionality, let's step through an example that will mirror the scenario covered in our previous article (which we handled at that time by leveraging Data Conversion Transformation). To start, create a sample spreadsheet that will serve as a source of data in our import operation. As before, we will limit it to two columns with strings shorter than 255 characters each (represented as Unicode String DT_WSTR data type when viewed in Business Intelligence Development Studio) containing the following entries:

Fear and Loathing in Las Vegas Terry Gilliam
The Saragossa Manuscript       Wojciech Hass

Our destination will consist of a target table in an arbitrarily chosen user database to which you have dbo privileges. Its first column, Title, will have varchar data type, while the second one, named Director, will be defined as nvarchar. The length of both will be limited to 100 characters (respectively, non-Unicode and Unicode). This can be accomplished by executing the following T-SQL statement:

CREATE TABLE [dbo].[TwoThumbsUp](
        [Title][varchar](100) NOT NULL,
        [Director][nvarchar](100) NOT NULL
)

This time, instead of relying on the wizard to guide us through the automatic generation of a package, we will use a manual process that involves creating an Integration Services project within Business Intelligence Development Studio. Once you have launched it (via its shortcut in the Start -> All Programs), select New -> Project... from the File menu. In the resulting dialog box, choose the Integration Services Project entry in the Templates section, as well as specify an arbitrary Name and Location. This will yield a new single project solution containing an SSIS package Package.dtsx, which you can easily identify by examining the content of the Solution Explorer window. We will modify its content by using the Package Designer and Connection Managers windows.

Right-click on the latter to add connections to our two data stores. In the context-sensitive menu, select the following:

  • New Connection... item, which will trigger the display of the Add SSIS Connection Manager dialog box.
  • Choose the EXCEL entry corresponding to the Connection manager for Excel files.
  • Click on the Add... command button.
  • Point to the location of our sample spreadsheet.
  • Select the appropriate Excel version,
  • Uncheck the First row has column names checkbox.
  • To facilitate connectivity to the TwoThumbsUp table we created earlier, select the New OLE DB Connection... entry from the same context-sensitive menu of the Connection Managers window.
  • Provide the target Server name, corresponding authentication method, and name of the database where the table resides (use Test Connection to verify accuracy of the information you provided).

At this point, we are ready to populate our package with components that will carry out data extraction, transformation, and loading. We already know (based on our earlier analysis of the package generated by Import and Export Wizard) that these actions can be implemented using a single Data Flow Task, consisting of a source and a destination component, as well as a transformation that will handle data conversion. To create such a task, simply click on the link appearing in the center of the Data Flow tab of the designer window. Next, drag the Excel Source icon from the Data Flow Sources section of the Toolbox onto its empty surface. Use the Edit option in the context-sensitive menu of the newly added component to display its editor window. Confirm that the OLE DB Connection Manager listbox contains the Excel Connection Manager entry, the Data access mode is set to Table or view, and the Name of the Excel sheet is configured as Sheet1$ (use the Preview command button to verify that you see content of both sample rows).

Now it is time to add our package Derived Column Transformation. Drag it from the Data Flow Transformations section of the Toolbox onto the designer's surface directly under Excel Source. Connect the two by extending the green arrow originating from the bottom edge of the top one (representing its standard output) until it reaches the upper edge of the one underneath it. Activate Derived Column Transformation Editor by selecting the Edit... option from its context sensitive menu. Note that the resulting window is divided into three sections. The upper left one gives you access to package variables (which we have not yet introduced) and input columns (consisting, in our case, of F1 and F2 and representing content of our sample spreadsheet). To the right, you will find operations that can be applied in order to generate the desired output, including following:

  • Mathematical Functions (ABS, CEILING, EXP, FLOOR, LN, LOG, POWER, ROUND, SIGN, SQUARE, and SQRT)
  • String Functions (CODEPOINT, FINDSTRING, HEX, LEN, LOWER, LTRIM, REPLACE, REPLICATE, REVERSE, RIGHT, RTRIM, SUBSTRING, TRIM, and UPPER)
  • Date and Time Functions (DATEADD, DATEDIFF, DATEPART, DAY, GETDATE(), GETUTCDATE(), MONTH, and YEAR)
  • NULL Functions (ISNULL and NULL for each of the SSIS data type)
  • Mathematical and logical Operators
  • Type Casts. That last category is of particular importance to us, since it will allow us to implement data type mapping in a manner that facilitates SSIS data conversion rules.

To illustrate the need for such conversion, let's pipe output columns from the newly added transformation directly to the target table (without any modifications). Drag OLE DB Destination from the Data Flow Destinations section of the Toolbox and place it below the Derived Column component on the designer's surface. Connect the output of the latter (represented by a green arrow originating from its bottom edge) to the input of the former. Launch its editor by selecting the Edit... item from its context-sensitive menu. Ensure that its entries point to the previously created OLE DB connection manager and Data access mode is listed as Table or view - fast load, as well as setting Name of the table or the view entry to [dbo].[TwoThumbsUp]. Switch to the Mappings section in the editor interface and connect the input columns (F1 and F2) to their respective destination columns (Title and Director). Once you click on the OK command button to confirm your selection, you will notice a red cross appearing on the OLE DB Destination, with a tooltip informing you that Columns "F1" and "Title" cannot convert between unicode and non-unicode string data types.

To remediate this issue, open the Derived Column Transformation Editor. In the lower portion of its window, select the <add as a new column> entry in the Derived Column listbox. Provide a descriptive name (e.g. Derived Column F1) in the Derived Column Name column and type in the typecasting expression that will convert the unicode string ([DT_WSTR]) of 255 characters in length into non-unicode ([DT_STR]) 100 characters long (using 1252 West European Latin code page). This takes the form of (DT_STR, 100, 1252)(F1), which you can also put together by dragging relevant functions and column names from their respective panes into the Expression column in the lower part of the window. To verify that the resulting output has the desired characteristics, examine the Data Type and Length columns (which should contain at this point string [DT_STR] and 100, respectively). Confirm your choices by clicking on the OK command button and launch OLE DB Destination Editor. Switch to its Mappings section, delete the link between the F1 and Title columns and create one between Derived Column F1 and Title instead.

Once you are back in the designer interface, you will notice that the red cross has been replaced by a yellow exclamation mark, which means that our original error condition has been resolved. As the tooltip indicates, at this point we need to be aware that Truncation may occur due to inserting data from data flow column 'F2' with a length of 255 to database column 'Director' with a length of 100 (you can also view this warning by opening the Error List window from the View menu). To remediate this, return to the Derived Column Transformation Editor, create another entry in the Derived Column Name (we will call it Derived Column F2) with the <add as a new column> option in the Derived Column listbox and (DT_WSTR, 100)(F2) as the expression. Once completed, launch the OLE DB Destination Editor again, switch to its Mappngs section, and replace the mapping between F2 and Director with another one between Derived Column F2 and Director.

The resulting package demonstrates our intended goal, by utilizing Derived Column Transformation to facilitate conversion between incompatible data types. In our next article, we will review other types of SSIS components that assist with resolving the most common ETL challenges.

Additional Resources

MSDN Derived Column Transformation
MSDN How to: Derive Column Values Using the Derived Column Transformation

» See All Articles by Columnist Marcin Policht








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers