Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted Mar 9, 2007

SQL Server 2005 Integration Services - Part 44 - Bulk Insert Task

By Marcin Policht

In the recent installments of our series dedicated to SQL Server 2005 Integration Services, we have been discussing individual Control Flow tasks, focusing on their unique functionality and characteristics. One of the few remaining items in this category is the Bulk Insert task, which will be the topic of this article.

Bulk Insert functionality has been available since SQL Server 7.0 (in the form of BULK INSERT T-SQL statement), when it was introduced to provide a fast and straightforward method of importing data into a database (and supplement the slower, although more versatile bcp Command Line utility). Despite its limited flexibility, (we will be discussing these limitations in more detail shortly), it is still commonly used due to its impressive efficiency. With minor exceptions, which we described when covering SQL Server 2005 Integration Services performance (where we pointed out that when importing files containing large amounts of date, time, currency, or numeric data in the most common formats, without locale specific and special currency characters or missing and implied date parts, it is typically possible to minimize processing time by reading their content with the Flat File Source Data Flow component and having its Fast Parse property set to True), Bulk Insert offers superior speeds. There are, however, several caveats that you need to be aware of in order to fully take advantage of its potentials.

Among the most relevant warning (when dealing with sizeable data sources) is configuring your target database with Bulk-logged recovery model. This can be accomplished from the Options section of the Database Properties dialog box in the SQL Server Management Studio, which keeps track of extent and page allocations, but does not register individual record additions. (Keep in mind, however, implications regarding recoverability and perform a backup before and after the insert operation - as well as switch back to full recovery model afterwards). Consider locking the target table (by executing sp_tableoption stored procedure to enable its "table lock on bulk load" option or by running BULK INSERT with TABLOCK hint) in order to minimize lock contention and avoid row level locking, which in turn improves efficiency. Avoid having indexes on a target table (you might want to drop them prior to data load, providing that the amount of data to be imported significantly exceeds the number of records already present in the table) and ensure that the table is not marked for replication. If possible, copy a source file to the same server where the Bulk Insert task will be running (in order to avoid a delay associated with transferring it over the network during package execution). This also will let you disregard additional requirements that result from security context in which BULK INSERT is operating which, in turn, depends on the way the package is launched (for more information on this subject, refer to one of our earlier articles of this series). Keep in mind that a package containing Bulk Insert task can be executed only by members of the sysadmin fixed server role. Divide your entire load into individual batches (with the BATCHSIZE parameter of BULK INSERT), which are then processed as separate transactions (unless the task joins a transaction already in progress, in which case its batches are subject to the parent's commit or rollback). This makes larger data loads more manageable and improves the potential for concurrency. When replacing existing records, truncate the target table (or recreate it) rather than using fully logged DELETE T-SQL command.

To demonstrate implementation of these recommendations in the context of SSIS Bulk Insert task, we will leverage an example posted on the MSDN web site (which is based on the AdventureWorks sample database, so make sure you have it installed - you can download the relevant Windows Installer .MSI setup files from the Microsoft Download Center). Initiate a new project of Integration Services type in the SQL Server Business Intelligence Studio. Add Execute SQL Task (which will create the HumanResources.myTeam table - as in MSDN example) to the Control Flow of its Designer interface. Select the Edit... option from the task's context sensitive menu to display its Editor window. In the General section, assign a new OLE DB Connection Manager pointing to the AdventureWorks database on the local SQL Server 2005 instance (with appropriate authentication settings). With SQLSourceType set to "Direct input", copy the following T-SQL Statement as the SQLStatement property value:

USE AdventureWorks;
IF EXISTS
   (SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
       WHERE TABLE_SCHEMA = 'HumanResources' 
   AND TABLE_NAME = 'myTeam')
    DROP TABLE HumanResources.myTeam
GO
CREATE TABLE HumanResources.myTeam 
(EmployeeID smallint NOT NULL,
Name nvarchar(50) NOT NULL,
Title nvarchar(50) NULL,
Background nvarchar(50) NOT NULL DEFAULT ''
);
GO

The statement creates a HumanResources.myTeam table (after checking first if one does not already exist, and if so, dropping it). We will populate it with sample data consisting of the following records (save it to an arbitrarily named file on the computer hosting your SQL Server 2005 installation):

77,'Mia Doppleganger','Administrative Assistant','Microsoft Office'
49,'Hirum Mollicat','I.T. Specialist','Report Writing and Data Mining'

Once the file is saved, create a Flat File Connection Manager that points to it. Accept its default values with the exception of the Text qualifier value (in the General section of the Connection Manager Editor window), which you might want to change to a single quote ('), to reflect the way data strings are marked. Verify that the values are displayed properly (i.e. without the single quotes) after you switch to the Columns section and close the Editor window. Next, drag the Bulk Insert Task icon from the Toolbox onto the Designer interface directly underneath the Execute SQL task and connect them with precedence constraint (such that execution of the former depends on the successful completion of the latter). Using the Edit option in the context sensitive menu of the Bulk Insert Task, display its Editor window. In its Connections section, select the earlier defined OLE DB connection, pointing to the AdventureWorks database on the local SQL Server installation and specify its [HumanResources].[myTeam] table as the DestinationTable (note that you can also designate views as import targets). Properties grouped in the Format subsection specify the way source data is delimited. This information can be provided directly (by assigning values to RowDelimiter and ColumnDelimiter properties, which by default are set to {CR}{LF} and Tab, respectively), or through a format file (which we will cover in our next article). To accommodate our sample data, change the value of ColumnDelimiter from Tab to Comma {,}. Using File property under the Source Connection subsection, choose the earlier defined Flat File Connection Manager, pointing to the local source data file.

Switch to the Options section in the Bulk Insert Task Editor interface. The choices available here might have significant impact on the way the task is processed, so let's review them in more detail:

  • CodePage - specifies the code page of character data in the source file. The default of RAW speeds up processing by skipping translation from one code page to another (you need to determine whether this is applicable depending on content of your source data and code page of a target database).
  • DataFileType - can take on one of four possible values - char (default indicating standard character format), native (offering superior performance compared to the char option but applicable to native data types), widechar (similar to char, but accommodating Unicode characters), and widenative (performing better than the widechar option, allowing for Unicode characters in char, varchar, and text columns, but expecting native data types in all others).
  • BatchSize - assigns the number of rows to be processed as a single transaction (the default value of zero loads the entire data file as part of the same transaction).
  • LastRow - specifies the last row from the data file to be included.
  • FirstRow - specifies the first row from the data file to be included.
  • Options - affects the speed of loading by dictating whether to perform such actions as checking for constraints on a target table and its columns (this might have significant impact if your target table contains a clustered index), keeping nulls in columns for which data is not included in the source file, enabling identity insert, applying table lock (invoking TABLOCK hint), and firing triggers (resulting from inserting rows into target table).
  • SortedData - allows you to sort imported data by specifying the column to ORDER BY in the target table. For the sake of efficiency, you might want to consider keeping it at its default FALSE value.
  • MaxErrors - sets the maximum number of errors that can take place before the task execution is aborted.

With our simplified source file, you can simply leave all the defaults in place (you would need to adjust them if you dealt with a more realistic data set, according to its size, filetype, code page translation, etc.). Close the task Editor window and launch the package. Once the execution completes, the target table should have the following content:

77 'Mia Doppleganger' 'Administrative Assistant' 'Microsoft Office'
49 'Hirum Mollicat' 'I.T. Specialist' 'Report Writing and Data Mining'

Note that the table contains the same records as our source file, including single quotes used to qualify text fields (despite the fact that we defined them as Text qualifiers in our Flat File Connection Manager). While you can cleanse the data prior to or after the import, a more efficient option is to address this issue by employing the earlier mentioned format files, which provide additional processing capabilities. We will present this approach in our next article.

» See All Articles by Columnist Marcin Policht



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM


















Thanks for your registration, follow us on our social networks to keep up-to-date