SQL Server 2005 Integration Services – Part 45 – Bulk Insert Task and Format Files

In the previous article of our series dealing with SQL Server 2005
Integration Services, we have presented a straightforward approach to importing
data from delimited text files into database tables and views, using Bulk
Insert Control Flow task (which, in turn, leverages T-SQL BULK INSERT
functionality) with its basic formatting options. While such an approach might
work well in some cases, its lack of flexibility is likely to become an issue
(as was the case with our sample data file). One way to address such situations
is to employ format files, which, while slightly more complex to implement,
offer additional processing options. We will cover two types of such files,
describing in more detail their characteristics and providing examples of their
implementation.

We will demonstrate the use of format files by employing samples from our previous article, with the same target table that we
have generated with Execute SQL Task containing the following SQLStatement
property. (You might want to initiate for this purpose a new project of
Integration Services type in the SQL Server 2005 Business Intelligence Studio
or simply reuse the one we created as part of the exercise accompanying the
previous article):

USE AdventureWorks;
IF EXISTS
   (SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
       WHERE TABLE_SCHEMA = 'HumanResources' 
   AND TABLE_NAME = 'myTeam')
    DROP TABLE HumanResources.myTeam
GO
CREATE TABLE HumanResources.myTeam 
(EmployeeID smallint NOT NULL,
Name nvarchar(50) NOT NULL,
Title nvarchar(50) NULL,
Background nvarchar(50) NOT NULL DEFAULT ''
);
GO

Our sample data file will also have the same content as
before:

77,'Mia Doppleganger','Administrative Assistant','Microsoft Office'
49,'Hirum Mollicat','I.T. Specialist','Report Writing and Data Mining'

Which we pointed to using a Flat File Connection manager
with Text qualifier set as a single quote (although as we have previously found
out, this setting has no impact on the outcome of BULK INSERT operation).

Format files contain mapping instructions between data in a source file and
columns in a target table. They can be applied to BULK INSERT, bpc, and INSERT
… SELECT * FROM OPENROWSET(BULK… statements (the last one actually requires
their presence) in order to assist with more complex data import operations.
Such operations might involve a mismatch (in terms of order or quantity)
between data fields in a text file and columns in a target table (for example,
when loading the same data file into multiple, distinct tables) or scenarios
where delimiters differ across columns within the same data file. In general,
there are two types of format files – XML-based ones (introduced with SQL
Server 2005) and legacy ones (referred to as non-XML). Since both of them are
compatible with SSIS Bulk Insert task, we will cover them here (they are
interchangeable, although there are some exclusive benefits associated with
each, which we will briefly mention).

Traditional format files follow syntactical rules presented in more detail
in Understanding
Non-XML Format Files
article of SQL Server 2005 Books online. In
particular, each starts with two rows that designate the format version number
(9.0 indicates SQL Server 2005) and number of columns (corresponding to the
number of relevant fields in the data file). These are followed by rows defining
processing details for each of the target columns, including the position of
each field in the data file, its datatype, prefix length (set to 0 if the field
can contain NULL values), maximum data length (in bytes), field terminator,
ordinal number of the corresponding column in a target table, column name, and
its collation. While this might sound rather confusing and time-consuming to
put together, you can easily generate such files with the bcp utility, by
invoking it from the Command Prompt in the following manner (assuming that your
target table already exists):

bcp targetTable format nul -c -t , -f formatFile.FMT -T

Where targetTable
is name of the target table (AdventureWorks.HumanResources.myTeam in our case),
format nul specifies the
operation to be performed (i.e. generating a format file), -c indicates that the source file will
be using character data type (other possible values -n, -N, and -w apply to
native, widenative, and widechar data types, respectively), -T connects to SQL Server via integrated
security (otherwise you need to specify login and password with -U and -P
switches), -t , designates
comma as the field delimiter (we keep the default of newline character as the
row delimiter), and -f formatFile.FMT
points to the format file that will contain the outcome of this operation
(refer to the article on bcp Utility
for a complete list of switches). In our case, the command will take the form:

bcp AdventureWorks.HumanResources.myTeam format nul -c -t , -f myTeam.Fmt -T

Which will result in the following non-XML format file
(details might vary depending on your collation settings):

9.0
4
1 SQLCHAR 0 7 "," 1     EmployeeID ""
2 SQLCHAR 0 100 "," 2     Name SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "," 3     Title SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 100 "rn" 4     Background SQL_Latin1_General_CP1_CI_AS

If we were to execute our sample SSIS Bulk Insert task with
this format file, we would end up with the table content identical to the one
presented in our previous article. In order to get rid of single quotes in the
target table, we need to modify our format file and alter our delimiter
definitions (which will populate the table with source data fields in the
intended manner):

9.0
4
1 SQLCHAR 0 7 ",'" 1 EmployeeID ""
2 SQLCHAR 0 100 "','" 2 Name SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "','" 3 Title SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 100 "'rn" 4 Background SQL_Latin1_General_CP1_CI_AS

Save these changes into a file residing in an arbitrary location on the SQL
Server 2005 computer. To verify their correctness, add Bulk Insert task to the
Control Flow Designer interface directly underneath the earlier configured Execute
SQL task and make the former dependent on the success of the latter with a
precedence constraint (or, if you are modifying the sample package created
according to instructions in our previous article of this series, simply reuse
the one already there). After displaying the task Editor window, ensure that
its Connection subsection (within Connection section) points to the target
server, database, and table ([AdventureWorks].[HumanResources].[myTeam]). In
the Format subsection, point to the newly saved format file (with Format
property set to "Use file"). Finally, ensure that our original text
file is used as the Source Connection. Once you execute the package, the target
table will be populated with data fields from our text file, but this time without
extraneous delimiters.

The same goal can be accomplished with XML format files, which were
introduced in SQL Server 2005. In addition to features available with their
legacy counterparts, they give you the ability to identify not only data types
of imported fields in a text file but also columns in a target table. Among
their strengths are also self-documenting properties (which makes them easier
to review and modify) and extensibility. On the other hand, they lack the ability
to accommodate import scenarios, in which you need to skip one or more columns
(except for the last one) in a target table. (To work around this limitation,
you can create a view on the table and BULK INSERT into it). A detailed description
of XML format file syntax is available in Schema Syntax for
XML Format Files
article of SQL Server 2005 Books Online. Just as with
legacy format files, we can create an XML format file by leveraging the bcp
utility. This is done by simply adding -x switch to the command line invocation
we used previously and specifying myTeam.XML as the name of the format file to
be created:

bcp AdventureWorks.HumanResources.myTeam format nul -c -t , -f myTeam.XML -x -T

Which will populate it with the following content:

<?xml version="1.0"?>
<BCPFORMAT  xmlns_xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi_type="CharTerm" TERMINATOR="," MAX_LENGTH="7"/>
  <FIELD ID="2" xsi_type="CharTerm" TERMINATOR="," MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="3" xsi_type="CharTerm" TERMINATOR="," MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
  <FIELD ID="4" xsi_type="CharTerm" TERMINATOR="rn" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
 </RECORD>
 <ROW>
  <COLUMN SOURCE="1" NAME="EmployeeID" xsi_type="SQLSMALLINT"/>
  <COLUMN SOURCE="2" NAME="Name" xsi_type="SQLNVARCHAR"/>
  <COLUMN SOURCE="3" NAME="Title" xsi_type="SQLNVARCHAR"/>
  <COLUMN SOURCE="4" NAME="Background" xsi_type="SQLNVARCHAR"/>
 </ROW>
</BCPFORMAT>

As before, in order to eliminate superfluous single quotes, you need to
modify delimiters (referenced here with the marker TERMINATOR) to match the
changes we applied to non-XML format file. Once this is completed, simply
specify the location and name of this file as the value of the FormatFile
property in the Connection section of the Bulk Insert Task Editor window (with
Format property set to "Use File") and execute the package. For more
detailed information on this topic, refer to the set of MSDN articles in the
section titled Format
Files for Importing or Exporting Data
.

»


See All Articles by Columnist
Marcin Policht

Marcin Policht
Marcin Policht
Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

Latest Articles