Despite its benefits, SQL Server Integration Services' Import Export Wizard has a number of limitations, resulting in part from a new set of rules that eliminate implicit data type conversion mechanisms present in Data Transformation Services. This article discusses a method that addresses such limitations, focusing in particular on importing the content of Excel spreadsheets into SQL Server.
As we have demonstrated in our
recent articles covering the most relevant features of Import
and Export Wizard (included in the SQL
Server 2008 and 2008 R2 Integration Services), its capabilities are quite
helpful in simplifying extraction, transformation, and loading tasks. On the
other hand, we also pointed out some of its limitations resulting from a new
set of rules, (on which its latest version is based) that eliminate implicit
data type conversion mechanisms present in Data Transformation Services. This
conscious design decision prevents confusion (and, more importantly, a potential
data loss) caused by unanticipated (and difficult to control) changes to data
types and their characteristics. On the other hand, restricting extent to which
the Import and Export Wizard made arbitrary decisions on your behalf is likely
to force you to take over some of its former responsibilities, effectively
making the process of creating packages more challenging. One of the most
common scenarios that pose such challenges involves importing the content of
Excel spreadsheets into SQL Server. In this article, we will discuss a method
that assists with their resolution.
The underlying reason for such
issues is the limited number of data types supported by OLE DB Provider for Jet
4.0, which provides the interface between Excel and SQL Server 2008 Integration
Services. In particular, content of individual columns of a spreadsheet
(appearing on the left) can be categorized as one of the following SSIS data
types (listed on the right):
- Boolean corresponding to Boolean (DT_BOOL),
- Currency corresponding to Currency (DT_CY),
- Date/Time corresponding to Date (DT_DATE),
- Numeric corresponding to Double-Precision Float (DT_R8),
- Memo corresponding to Unicode Text Stream (DT_NTEXT) - if the length of column values exceed 255 characters,
- String corresponding to Unicode String (DT_WSTR) - if the length of column values does not exceed 255 characters.
Note that in the last two cases,
character-based entries are always treated as Unicode, which triggers
conversion error if you try to copy them directly to a varchar column in a SQL Server-based table. Similarly, attempts to
copy String data to nvarchar
columns, which size is shorter than 255 characters will generate a
validation warning about the possibility of truncation. While you have an
option to ignore these warnings (assuming that you are confident that they
provide no basis for concern), you will not be able to do use the same approach
when it comes to errors.
One way to handle such cases gracefully
is to take advantage of Data
Conversion Transformation. This component of Data Flow Task is typically automatically added to packages created by
Import and Export Wizard when copying data stored in an Excel spreadsheet into
a SQL Server 2008-based table (which you can easily determine by saving such
packages and viewing them in the Designer interface
of Business Intelligence
Development Studio). Let's
step through an example that follows this approach.
First, create a sample spreadsheet
that will serve as a source of data in our import operation. For the sake of
simplicity, we will limit it to two columns with strings of characters shorter
than 255 characters each (which, according to our expectations, should be
represented as SSIS Unicode
String DT_WSTR data type
when viewed in Business
Intelligence Development Studio). Its
content will consist of the following entries:
Fear and Loathing in Las Vegas Terry Gilliam
The Saragossa Manuscript Wojciech Hass
Once you have saved them in the
spreadsheet, create a target table in an arbitrarily chosen user database (we
assume that you are logged on with the dbo privileges). To make our example a bit more interesting, we
will define its first column (Title) as varchar
and the second one (Director) as nvarchar,
with both limited to 100 (respectively, non-Unicode and Unicode) characters. We will
accomplish this by executing the following T-SQL statement:
CREATE TABLE [dbo].[TwoThumbsUp](
[Title][varchar](100) NOT NULL,
[Director][nvarchar](100) NOT NULL
)ON [PRIMARY]
With both the source Excel spreadsheet
(including its data) and target SQL Server-based table in place, invoke a
32-bit version of Import and
Export Data Wizard. On the Choose a Data Source page, select Microsoft Excel
from the Data
source drop-down list and specify the path
to the .xlsx file (make sure to clear First row has column names checkbox). On the Choose a Destination
page, ensure that SQL Server
Native Client 10.0 is designated as the Destination, provide the Server name and Database, and chose an appropriate Authentication
method. Accept the Copy
data from one or more tables or views
option on the Specify
Table Copy or Query page, mark checkbox next to the 'Sheet 1$' entry in the Source column and assign [dbo.].[TwoThumbsUp]
as its Destination. Once you click on Next > command
button, you will be given a chance to Review Data Type Mapping.
The lower portion of the window you are presented with displays individual
entries representing source data columns, along with an action to be carried
out during data transfer. As you can easily verify, the first one will be the
subject to conversion (as indicated by the marked checkbox in the Convert column). Double-clicking on that entry will provide Column Conversion Details, which in our case, include a
change from DT_WSTR to DT_STR SSIS data type. After confirming that this is the desired
outcome, you will be given an option to Save SSIS Package
(as well as Run it
immediately). Select it and pick an arbitrary
file system location as its store.
Executing the package should populate
the target table ([dbo].[TwoThumbsUp] with two rows of data from the Excel spreadsheet. However,
we are primarily interested in the content of our package. To view it, create a
new solution in the Business
Intelligence Development Studio. Next,
use the Add
Existing Package option in the context-sensitive
menu of the SSIS
Packages node displayed in Solution Explorer window to include in the solution the DTSX
file saved in the previous step.
Displaying the Designer interface will reveal that the package consists of one SourceConnectionExcel facilitating communication with Excel driver, one DestinationConnectionOLEDB providing connectivity to the SQL Server database hosting
the target table, and a single Data
Flow Task. If you double-click on it, you
will be presented with its content (displayed on the Data Flow tab). You will notice that in addition to the Excel Source and OLE
DB Destination components, there is also Data Conversion Transformation, which serves as intermediary between the first two. Its
purpose is to implement the data type mapping in a manner that will prevent any
potential data loss. Let's analyze more closely how
this is accomplished.
Open Advanced Editor for Source
and switch to its Input and
Output Properties tab. Under the Excel Source Output node in the Inputs and outputs
section of the window, there are two subnodes,
labeled respectively External
Columns and Output Columns.
The first of them represents metadata associated with the content of Excel
spreadsheet, while the latter is its internal SSIS representation (you can
determine mappings between them by referring to the Column Mappings tab). As you can see, both columns representing our sample
data are identified as having Unicode
string (DT_WSTR) data
type of 255 characters in length (as expected). Now launch the Data Conversion Transformation
Editor by selecting the Edit... option from its context sensitive menu (or by using the Show Editor link in its Properties window).
Note that in this case, the data type of the first input column is set to string [DT_STR] of length 100 and available as an aliased output column. This effectively
eliminates the cause of error messages due to incompatibility between Unicode
and non-Unicode characters. You can also view and modify the same setting by
launching the Advanced
Editor window and switching to its Input and Output Properties tab, where you will locate the DataType entry of the Output Column
under the Data
Conversion Output node of the Inputs and outputs section. Finally, by opening the Advanced Editor for Destination dialog box and switching to its Column Mappings tab, you will be able to verify that the output column of
the Data Conversion
Transformation (represented here as one of
available input columns) is mapped to the Title destination column.
While we have eliminated the reason
for the error, we are still receiving the warning due to a mismatch between
lengths of the other column at the source and destination. If you want to get
rid of it (assuming you are certain that truncation will not lead to data loss),
open the Data
Conversion Transformation Editor
window, mark the checkbox next to the other entry in the Available Input Columns window (which represents the second column), ensure that Unicode string [DT_WSTR] appears in the Data Type column,
and set the value of Length to 100. Close the editor by clicking on the OK button, launch the Advanced Editor for Destination, and switch to its Column Mappings
tab. Finally, delete the current mapping to the Director column
and assign it to our newly created input column (representing the additional
output column of the Data
Conversion Transformation).
In our next article, we will
demonstrate how the same result can be accomplished by using Derived Column Transformation, as well as discuss other functionality provided by this Data Flow component.
MSDN Data Conversion Transformation
MSDN Data Type Conversion in Expressions
MSDN Connectivity and SQL Server 2005 Integration Services
MSDN Convert Types without Conversion Checking (SQL Server Import and Export Wizard)
»
See All Articles by Columnist
Marcin Policht