Importing Excel data into SSIS 2008 using Data Conversion Transformation
April 26, 2010
Despite its benefits, SQL Server Integration Services' Import Export Wizard has a number of limitations, resulting in part from a new set of rules that eliminate implicit data type conversion mechanisms present in Data Transformation Services. This article discusses a method that addresses such limitations, focusing in particular on importing the content of Excel spreadsheets into SQL Server.
As we have demonstrated in our recent articles covering the most relevant features of Import and Export Wizard (included in the SQL Server 2008 and 2008 R2 Integration Services), its capabilities are quite helpful in simplifying extraction, transformation, and loading tasks. On the other hand, we also pointed out some of its limitations resulting from a new set of rules, (on which its latest version is based) that eliminate implicit data type conversion mechanisms present in Data Transformation Services. This conscious design decision prevents confusion (and, more importantly, a potential data loss) caused by unanticipated (and difficult to control) changes to data types and their characteristics. On the other hand, restricting extent to which the Import and Export Wizard made arbitrary decisions on your behalf is likely to force you to take over some of its former responsibilities, effectively making the process of creating packages more challenging. One of the most common scenarios that pose such challenges involves importing the content of Excel spreadsheets into SQL Server. In this article, we will discuss a method that assists with their resolution.
The underlying reason for such issues is the limited number of data types supported by OLE DB Provider for Jet 4.0, which provides the interface between Excel and SQL Server 2008 Integration Services. In particular, content of individual columns of a spreadsheet (appearing on the left) can be categorized as one of the following SSIS data types (listed on the right):
Note that in the last two cases, character-based entries are always treated as Unicode, which triggers conversion error if you try to copy them directly to a varchar column in a SQL Server-based table. Similarly, attempts to copy String data to nvarchar columns, which size is shorter than 255 characters will generate a validation warning about the possibility of truncation. While you have an option to ignore these warnings (assuming that you are confident that they provide no basis for concern), you will not be able to do use the same approach when it comes to errors.
One way to handle such cases gracefully is to take advantage of Data Conversion Transformation. This component of Data Flow Task is typically automatically added to packages created by Import and Export Wizard when copying data stored in an Excel spreadsheet into a SQL Server 2008-based table (which you can easily determine by saving such packages and viewing them in the Designer interface of Business Intelligence Development Studio). Let's step through an example that follows this approach.
First, create a sample spreadsheet that will serve as a source of data in our import operation. For the sake of simplicity, we will limit it to two columns with strings of characters shorter than 255 characters each (which, according to our expectations, should be represented as SSIS Unicode String DT_WSTR data type when viewed in Business Intelligence Development Studio). Its content will consist of the following entries:
Fear and Loathing in Las Vegas Terry Gilliam The Saragossa Manuscript Wojciech Hass
Once you have saved them in the spreadsheet, create a target table in an arbitrarily chosen user database (we assume that you are logged on with the dbo privileges). To make our example a bit more interesting, we will define its first column (Title) as varchar and the second one (Director) as nvarchar, with both limited to 100 (respectively, non-Unicode and Unicode) characters. We will accomplish this by executing the following T-SQL statement:
CREATE TABLE [dbo].[TwoThumbsUp]( [Title][varchar](100) NOT NULL, [Director][nvarchar](100) NOT NULL )ON [PRIMARY]
With both the source Excel spreadsheet (including its data) and target SQL Server-based table in place, invoke a 32-bit version of Import and Export Data Wizard. On the Choose a Data Source page, select Microsoft Excel from the Data source drop-down list and specify the path to the .xlsx file (make sure to clear First row has column names checkbox). On the Choose a Destination page, ensure that SQL Server Native Client 10.0 is designated as the Destination, provide the Server name and Database, and chose an appropriate Authentication method. Accept the Copy data from one or more tables or views option on the Specify Table Copy or Query page, mark checkbox next to the 'Sheet 1$' entry in the Source column and assign [dbo.].[TwoThumbsUp] as its Destination. Once you click on Next > command button, you will be given a chance to Review Data Type Mapping. The lower portion of the window you are presented with displays individual entries representing source data columns, along with an action to be carried out during data transfer. As you can easily verify, the first one will be the subject to conversion (as indicated by the marked checkbox in the Convert column). Double-clicking on that entry will provide Column Conversion Details, which in our case, include a change from DT_WSTR to DT_STR SSIS data type. After confirming that this is the desired outcome, you will be given an option to Save SSIS Package (as well as Run it immediately). Select it and pick an arbitrary file system location as its store.
Executing the package should populate the target table ([dbo].[TwoThumbsUp] with two rows of data from the Excel spreadsheet. However, we are primarily interested in the content of our package. To view it, create a new solution in the Business Intelligence Development Studio. Next, use the Add Existing Package option in the context-sensitive menu of the SSIS Packages node displayed in Solution Explorer window to include in the solution the DTSX file saved in the previous step.
Displaying the Designer interface will reveal that the package consists of one SourceConnectionExcel facilitating communication with Excel driver, one DestinationConnectionOLEDB providing connectivity to the SQL Server database hosting the target table, and a single Data Flow Task. If you double-click on it, you will be presented with its content (displayed on the Data Flow tab). You will notice that in addition to the Excel Source and OLE DB Destination components, there is also Data Conversion Transformation, which serves as intermediary between the first two. Its purpose is to implement the data type mapping in a manner that will prevent any potential data loss. Let's analyze more closely how this is accomplished.
Open Advanced Editor for Source and switch to its Input and Output Properties tab. Under the Excel Source Output node in the Inputs and outputs section of the window, there are two subnodes, labeled respectively External Columns and Output Columns. The first of them represents metadata associated with the content of Excel spreadsheet, while the latter is its internal SSIS representation (you can determine mappings between them by referring to the Column Mappings tab). As you can see, both columns representing our sample data are identified as having Unicode string (DT_WSTR) data type of 255 characters in length (as expected). Now launch the Data Conversion Transformation Editor by selecting the Edit... option from its context sensitive menu (or by using the Show Editor link in its Properties window). Note that in this case, the data type of the first input column is set to string [DT_STR] of length 100 and available as an aliased output column. This effectively eliminates the cause of error messages due to incompatibility between Unicode and non-Unicode characters. You can also view and modify the same setting by launching the Advanced Editor window and switching to its Input and Output Properties tab, where you will locate the DataType entry of the Output Column under the Data Conversion Output node of the Inputs and outputs section. Finally, by opening the Advanced Editor for Destination dialog box and switching to its Column Mappings tab, you will be able to verify that the output column of the Data Conversion Transformation (represented here as one of available input columns) is mapped to the Title destination column.
While we have eliminated the reason for the error, we are still receiving the warning due to a mismatch between lengths of the other column at the source and destination. If you want to get rid of it (assuming you are certain that truncation will not lead to data loss), open the Data Conversion Transformation Editor window, mark the checkbox next to the other entry in the Available Input Columns window (which represents the second column), ensure that Unicode string [DT_WSTR] appears in the Data Type column, and set the value of Length to 100. Close the editor by clicking on the OK button, launch the Advanced Editor for Destination, and switch to its Column Mappings tab. Finally, delete the current mapping to the Director column and assign it to our newly created input column (representing the additional output column of the Data Conversion Transformation).
In our next article, we will demonstrate how the same result can be accomplished by using Derived Column Transformation, as well as discuss other functionality provided by this Data Flow component.MSDN Data Conversion Transformation
MSDN Data Type Conversion in Expressions
MSDN Connectivity and SQL Server 2005 Integration Services
MSDN Convert Types without Conversion Checking (SQL Server Import and Export Wizard)