SSIS 2008 Import and Export Wizard and Excel-based Data


Even though the Import and Export Wizard, incorporated into the SQL Server 2008 platform, greatly simplifies the creation of SQL Server Integration Services packages, it has its limitations. This article points out the primary challenges associated with using it to copy data between SQL Server 2008 and Excel and presents methods of addressing these challenges.

The Import and Export Wizard, incorporated
into SQL Server 2008 platform, greatly simplifies
the creation of SSIS packages
, minimizing the need for editing their
content in the Business Intelligence Development Studio. However, it is worth
noting that our original example, which demonstrated this approach by exporting
the content of a view in the AdventureWorksDW database to a text file, was chosen mainly for its
exceptional simplicity, rarely experienced in real world scenarios. In this
article, we will focus on more practical circumstances, by analyzing the process
of copying data between SQL Server 2008 and Excel, pointing out primary
challenges associated with such transfers, and presenting methods of addressing
them.

To better understand potential issues
that might surface in such situations, it is important to realize that SSIS has
its own, distinct collection of data types, different from those used by data
stores that constitute sources and destinations of ETL operations. Integration
Services data types follow a consistent naming convention (each starts with the
DT_ prefix followed
by a type identifier); a comprehensive list can be found in the Integration Services Data Types article of SQL
Server 2008 Books
Online
. It is relatively straightforward to match each data type to their
counterparts in other databases, as you can determine by reviewing the table
within the Mapping
Integration Services Data Types to Database Data Types section of this
article. However, as you can see there, in a majority of the cases, their
relationships are not straightforward. These mappings are implemented by
default when performing data extraction and loading with Import and Export Wizard, however, it is
possible to modify them if the resulting conversion does not yield the desired
outcome (for example, you might want to enforce a more-space efficient data
type for columns if their data values fit within its range).

XML-formatted files, on which the
mappings are based, reside in the Program FilesMicrosoft SQL
Server100DTSMappingFiles folder. While they cover the most commonly
encountered database products and data types, you can modify them if you deal
with a less common set of conditions or if your requirements are not satisfied
by defaults employed by the Import and Export Wizard. Alternatively, you can adjust them
dynamically, while running the wizard, which, in addition, gives you the ability
to designate desired column names and some of data type properties, such as
length or code page. Keep in mind, however, that more elaborate options will
likely necessitate the use of the Designer environment in the Business Intelligence
Development Studio.

In addition to mappings between data
types implemented by disparate data stores, SSIS might be forced to carry out
internal data conversion (since its default mapping with a source might result
in a different data type than the one with its destination). The rules
governing such behavior are established based on the content of TypeConversion.xml file, located
in the Program
FilesMicrosoft SQL Server100DTSbinn folder (just as with the mapping files,
you can modify its content to implement desired functionality).

Let’s now take a look at the role of
these two mechanisms in exporting SQL Server-based data. To demonstrate this
process, we will copy the content of a table from the AdventureWorksDW database to an
Excel spreadsheet. To carry out this exercise successfully, make sure that you
are running 32-bit version of the Import and Export Wizard (since the
Excel driver is not available in 64-bit format). Once you launched it, point to
the installation of SQL Server 2008 (or 2008 R2) on the Choose a Data Source page, specify
the appropriate authentication method, and select AdventureWorksDW in the Database listbox. Next,
on the Choose a
Destination
page, highlight Microsoft
Excel
entry, set an arbitrary Excel
file path,
and assign the Excel
version
to Microsoft
Excel 2007
(note that you also have an option to designate that First row has column names). Pick Copy data from one or more
tables or views
once you reach the Specify
Table Copy or Query
page. Mark the checkbox next to the [dbo].[DimReseller] table in the Tables and views section of the Select Source Tables and
Views
page. Clicking on the Next> command button
will present you with the Review Data Type Mapping page, divided into two sections with
the upper one containing the list of tables (in our case, a single entry for [dbo].[DimReseller]) and the lower
consisting of Data
type mapping
for each of its columns. In both of them, you will find icons providing visual
clues indicating conversion status:

  • green checkmark conveys a successful
    outcome (if the conversion is guaranteed not to cause a data loss or is
    not needed altogether, which you can establish by checking the state of
    the Convert checkbox)
  • yellow exclamation point means that
    a conversion does take place and its processing should be reviewed (in
    order to determine whether a realistic possibility of data loss exists).
  • red circle with a cross is a sign
    that existing data cannot be converted (which effectively will prevent you
    from executing the resulting package until this issue is resolved).

In
addition, for each of the source and destination column pairs, you have the ability
to specify the action to be carried out (Fail or Ignore)
in response to data truncation and error condition.

Double clicking on individual entries in
the Data type
mapping
section will display the Column
Conversion Details
window, where you can find more about changes that will be applied during the
copy operation. In addition to the list of SSIS data types that have been auto
selected for source and destination mappings (along with the names of their
respective providers and mapping files that served as basis for this
selection), the information presented also contains conversion steps (if
applicable), as well as the reference to the file used to determine underlying
conversion rules (i.e. Program
FilesMicrosoft SQL Server100DTSbinnTypeConversion.xml). While it is
possible to reject these default settings directly from the Review Data Type Mapping page (by
clearing the checkbox in the Convert
field for every column that data type conversion behavior should be adjusted manually),
you will not be able to execute the package at that point (since the Run immediately entry on the Save and Run Package page will be
disabled). Instead, you will be forced to save it first, using options
presented on the Save
and Run Package
page. (Afterwards, you can edit the saved package via Business Intelligence
Development Studio
or by direct modifications to the .DTSX file).

Save the resulting SSIS package as an .DTSX
file and open it in the Designer interface of Business
Intelligence Development Studio (via File->Open-File…
menu). As you can see, its structure is a bit more complex than the one created
automatically when exporting a table from AdventureWorksDW
database to a text file that we described in our earlier article. Its Control
Flow
consists of two tasks, named Preparation SQL Task 1
(which creates DimReseller sheet in the
destination Excel workbook) and Data Flow Task 1
(handling actual data transfer and conversion). By double-clicking on the
latter, you will be automatically redirected to the Data Flow
tab. A
quick examination will reveal that, in addition to OLE DB
Source
labeled Source
– DimReseller (providing access to the [dbo].[DimReseller]
table
in AdventureWorksDW
database) and Excel Destination called Destination
– DimReseller (facilitating connection to the DimReseller
sheet in the target Excel workbook), the tab also contains Data
Conversion 0 – 0 transformation, which handles data
conversion according to rules defined in the Program FilesMicrosoft SQL
Server100DTSbinnTypeConversion.xml file.

If your intention is to modify these
rules, you can accomplish this by selecting the appropriate entries in the Data Type listboxes for
each Input
Column
and Output
Alias
combination listed in the file. In addition, for each such pair, you have the
ability to specify (wherever applicable) their Length, Precision, Scale, and Code Page values.

Unfortunately, reversing this process by
attempting to import data from Excel tends to be considerably more complex and
error prone, primarily due to a lack of metadata that would ensure consistency
of its data types. One of the more common issues surfaces in situations when an
imported column contains values that can be interpreted as either numbers or
text. OLE DB Provider for Jet 4.0 (with support of Excel ISAM driver)
determines the best match based on a rather superficial analysis of the first
few rows of data and assigns NULL
to all those that do not match its selection. Similar algorithm is applied to
text columns, which, depending on their length, are identified as containing
either String (if less than
or equal 255 Unicode characters) or Memo (otherwise) data type. Such potential problems are
flagged in the Import
and Export Wizard,
so it is up to you to decide how you want to resolve them (note that this
constitutes an improvement from earlier versions of SQL Server, where these decisions
were made implicitly by the wizard in an arbitrary manner, introducing a
possibility of data loss).

One way to address these problems is to
explicitly set the Import
Mode
in the Extended
Properties
section of the connection string of the Excel Connection Manager (by adding IMEX=1 entry enclosed
within a couple of semicolons). This can be done using its Properties window in the Designer mode of Business Intelligence
Development Studio
or by direct modification to the relevant XML element in the corresponding .DTSX file, (it is
also possible to modify connection strings from the Connection Managers node of DTSExecUI.exe utility).
Another approach involves increasing the number of rows sampled by the OLE DB
provider by assigning a larger value (by default set to 8) to TypeGuessRows entry in the HKLMSOFTWAREMicrosoftJet4.0EnginesExcel registry key.
In some cases however, these methods will not suffice, forcing you to resort to
other workarounds or to use additional Data Flow transformations (which we will be
covering in future articles). For additional factors that should be considered
when dealing with Excel-based data during SSIS import and export operations,
refer to SQL Server 2008 Books Online Excel
Source
and SQL Server 2008 Books Online Excel
Destination
articles in SQL Server 2005 Books Online.

»


See All Articles by Columnist

Marcin Policht

Marcin Policht
Marcin Policht
Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

Latest Articles