SQL Server Integration Services 2008: Importing Excel Data Using Derived Column Transformation


The complexity involved in transferring data between Excel and SQL Server results from different and sometimes incompatible data types. The Import and Export wizard mitigates potential issues introduced by these incompatibilities by taking advantage of Data Conversion Transformation. Marcin Policht describes another approach that produces an equivalent outcome by employing Derived Column Transformation instead.

In our most recent articles, we have
explained how SQL Server 2008 Integration Services Import and Export Wizard facilitate
the transfer of data between an Excel spreadsheet and a table residing in a SQL
Server database. The complexity involved in performing this seemingly
straightforward task results from different and sometimes incompatible data
types that are used by these two data stores. As we have demonstrated, the
wizard mitigates potential issues introduced by such incompatibilities by
taking advantage of Data
Conversion Transformation
(which is automatically added to the Data Flow task
of the resulting SSIS package). However, another approach produces an equivalent
outcome by employing Derived
Column Transformation
instead. This article describes its characteristics and demonstrates its usage.

Derived Column Transformation is one of the more
popular components of Data
Flow Task.
Despite its simplicity (as far as the ease of use is concerned), it is quite
powerful and flexible, primarily due to the SSIS expression language, which is
exposed through its interface. In order to become familiar with its basic
functionality, let’s step through an example that will mirror the scenario
covered in our previous article (which we handled at that time by leveraging Data Conversion
Transformation).
To start, create a sample spreadsheet that will serve as a source of data in
our import operation. As before, we will limit it to two columns with strings
shorter than 255 characters each (represented as Unicode String DT_WSTR data type when viewed in Business Intelligence
Development Studio)
containing the following entries:

Fear and Loathing in Las Vegas Terry Gilliam
The Saragossa Manuscript       Wojciech Hass

Our destination will consist of a target
table in an arbitrarily chosen user database to which you have dbo privileges. Its
first column, Title, will have varchar data type,
while the second one, named Director,
will be defined as nvarchar. The length of
both will be limited to 100 characters
(respectively, non-Unicode and Unicode). This can be accomplished by executing
the following T-SQL statement:

CREATE TABLE [dbo].[TwoThumbsUp](
        [Title][varchar](100) NOT NULL,
        [Director][nvarchar](100) NOT NULL
)

This time, instead of relying on the
wizard to guide us through the automatic generation of a package, we will use a
manual process that involves creating an Integration Services project within Business Intelligence
Development Studio.
Once you have launched it (via its shortcut in the Start -> All Programs), select New -> Project… from the File menu. In the
resulting dialog box, choose the Integration Services Project entry in the Templates section, as
well as specify an arbitrary Name
and Location. This will
yield a new single project solution containing an SSIS package Package.dtsx, which you can
easily identify by examining the content of the Solution Explorer window. We will
modify its content by using the Package Designer and Connection Managers windows.

Right-click on the latter to add
connections to our two data stores. In the context-sensitive menu, select the
following:

  • New
    Connection…
    item, which will trigger the display of the Add SSIS Connection Manager dialog box.
  • Choose
    the EXCEL entry
    corresponding to the Connection
    manager for Excel files.
  • Click
    on the Add… command button.
  • Point
    to the location of our sample spreadsheet.
  • Select
    the appropriate Excel
    version,
  • Uncheck
    the First row
    has column names
    checkbox.
  • To
    facilitate connectivity to the TwoThumbsUp table we created earlier, select the New OLE DB Connection… entry from the
    same context-sensitive menu of the Connection Managers window.
  • Provide
    the target Server
    name,
    corresponding authentication method, and name of the database where the table
    resides (use Test
    Connection
    to verify accuracy of the information you provided).

At this point, we are ready to populate
our package with components that will carry out data extraction,
transformation, and loading. We already know (based on our earlier analysis of
the package generated by Import
and Export Wizard)
that these actions can be implemented using a single Data Flow Task, consisting of
a source and a destination component, as well as a transformation that will handle
data conversion. To create such a task, simply click on the link appearing in
the center of the Data
Flow
tab of the designer window. Next, drag the Excel Source icon from the Data Flow Sources section of the Toolbox onto its empty
surface. Use the Edit option in the context-sensitive
menu of the newly added component to display its editor window. Confirm that the
OLE DB Connection Manager listbox contains the Excel Connection Manager entry, the Data
access mode is set to Table or view, and the Name of the Excel sheet
is configured as Sheet1$ (use the Preview command button
to verify that you see content of both sample rows).

Now it is time to add our package Derived Column
Transformation. Drag it from the Data Flow Transformations
section of the Toolbox onto the designer’s surface directly under Excel Source. Connect the two by extending the green
arrow originating from the bottom edge of the top one (representing its
standard output) until it reaches the upper edge of the one underneath it.
Activate Derived Column Transformation Editor by selecting the Edit… option from its context sensitive menu.
Note that the resulting window is divided into three sections. The upper left
one gives you access to package variables (which we have not yet introduced)
and input columns (consisting, in our case, of F1 and F2 and representing content of our sample
spreadsheet). To the right, you will find operations that can be applied in
order to generate the desired output, including following:

  • Mathematical
    Functions
    (ABS, CEILING, EXP, FLOOR, LN, LOG, POWER, ROUND, SIGN, SQUARE, and SQRT)
  • String
    Functions
    (CODEPOINT, FINDSTRING, HEX, LEN, LOWER, LTRIM, REPLACE, REPLICATE, REVERSE, RIGHT, RTRIM, SUBSTRING, TRIM, and UPPER)
  • Date and
    Time Functions
    (DATEADD, DATEDIFF, DATEPART, DAY, GETDATE(), GETUTCDATE(), MONTH, and YEAR)
  • NULL
    Functions
    (ISNULL and NULL for each of the
    SSIS data type)
  • Mathematical and
    logical Operators
  • Type
    Casts.
    That last category is of particular importance to us, since it will allow us to
    implement data type mapping in a manner that facilitates SSIS data conversion
    rules.

To illustrate the need for such
conversion, let’s pipe output columns from the newly added transformation
directly to the target table (without any modifications). Drag OLE DB Destination from the Data Flow Destinations section of the Toolbox and place it
below the Derived
Column
component on the designer’s surface. Connect the output of the latter
(represented by a green arrow originating from its bottom edge) to the input of
the former. Launch its editor by selecting the Edit… item from its context-sensitive menu.
Ensure that its entries point to the previously created OLE DB connection manager and Data access mode is listed as Table or view – fast load, as well as setting
Name of the
table or the view
entry to [dbo].[TwoThumbsUp]. Switch to the Mappings section in the
editor interface and connect the input columns (F1 and F2) to their respective destination columns (Title and Director). Once you
click on the OK command button
to confirm your selection, you will notice a red cross appearing on the OLE DB Destination, with a tooltip
informing you that Columns
"F1" and "Title" cannot convert between unicode and
non-unicode string data types.

To remediate this issue, open the Derived Column
Transformation Editor.
In the lower portion of its window, select the <add as a new column> entry in the Derived Column listbox.
Provide a descriptive name (e.g. Derived Column F1) in the Derived Column Name column and type
in the typecasting expression that will convert the unicode string ([DT_WSTR]) of 255 characters in
length into non-unicode ([DT_STR]) 100 characters long
(using 1252
West European Latin
code page). This takes the form of (DT_STR, 100, 1252)(F1), which you can
also put together by dragging relevant functions and column names from their
respective panes into the Expression
column in the lower part of the window. To verify that the resulting output has
the desired characteristics, examine the Data Type and Length columns (which should contain at this
point string
[DT_STR]
and 100, respectively).
Confirm your choices by clicking on the OK command button and launch OLE DB Destination Editor. Switch to its Mappings section, delete
the link between the F1 and Title columns and
create one between Derived
Column F1
and Title instead.

Once you are back in the designer
interface, you will notice that the red cross has been replaced by a yellow
exclamation mark, which means that our original error condition has been
resolved. As the tooltip indicates, at this point we need to be aware that Truncation may occur due to
inserting data from data flow column ‘F2’ with a length of 255 to database
column ‘Director’ with a length of 100 (you can also view this warning by
opening the Error
List
window from the View menu). To
remediate this, return to the Derived Column Transformation Editor, create another
entry in the Derived
Column Name
(we will call it Derived
Column F2)
with the <add
as a new column>
option in the Derived
Column
listbox and (DT_WSTR,
100)(F2)
as the expression. Once completed, launch the OLE DB Destination Editor again, switch
to its Mappngs section, and
replace the mapping between F2
and Director with another
one between Derived
Column F2
and Director.

The resulting package demonstrates our
intended goal, by utilizing Derived Column Transformation to facilitate conversion between

incompatible data types. In our next article, we will review other types of
SSIS components that assist with resolving the most common ETL challenges.

Additional Resources

MSDN Derived Column Transformation

MSDN How to: Derive Column Values Using the Derived Column Transformation

»


See All Articles by Columnist

Marcin Policht

Marcin Policht
Marcin Policht
Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

Latest Articles