Loading data
into SQL Server databases is one of those tasks that DBA’s perform routinely.
When you load large amounts of data into a database, it is important to do that
efficiently. Not all load processes are created equal. It is important to
know how each method works so you can choose the most appropriate load routine
for your data load situation.
When loading
large amounts of data into SQL Server databases you need to consider how logging
information affects performance. There is minimal and full logged operations.
This white paper goes into how to minimize logging of your bulk insert process,
and discusses the myths about transactions and minimally logged operations. In
addition, it also talks about how a trace flag, which was introduced in SQL
Server 2008, can be used to further control logging of indexed tables.
This paper discusses the
following different approaches you can use to bulk load data:
·
Integration Services Data Destinations
·
BCP
·
BULK INSERT
·
INSERT … SELECT
·
SELECT INTO
There is an
explanation of each one of these load operations, and how they compare against
each other. This provides you with a way to determine which bulk load method
works good for your situation. There is also discussion on how you can improve
loading data into HEAP storage and the how loading can be handled on tables with
clustered indexes.
This paper
clarifies the difference between the BATCH_SIZE and ROWS_PER_BATCH options. The
BATCH_SIZE you select can greatly affect how long your load operation might
take. It is best to test a number of situations to determine the optimize
BATCH_SIZE setting for your situation.
To help you
determine what kind of load operation should be performed a number of different
situations are analyzed by this paper. These different situations will provide
you with guidance in determining the best approach for loading data. To further
help with the decision making process, a convenient decision tree guides you
through yes and no answers to can quickly help you determine the most
appropriate method to use for a given situation.
A deep dive
into the BULK LOAD process is examined. This section of the paper looks into
parallelism and partitioning options. Knowing how to use parallelism and
partitioning can greatly optimize your processes. Also discussed are waits
statistics and performance counters. This discussion helps to identify how you
should monitor your load processes to identify potential performance
bottlenecks. By reviewing the different wait types and understanding, the
different performance counters will provide you with the information you need to
tweak your load processing to optimize the load performance. Additionally, how
you configure your I/O subsystem and the file placement of your database and raw
data files can affect performance. This paper talked about how to optimize your
I/O subsystem.
There are many
things to think about when you bulk load large amounts of data. By looking at
different aspects of your bulk load processes, and ensuring your environment is
configured correctly, and the right loading method is used can make or break
your large data load operations. You need to make sure you understand all the
different options and/or configuration settings you can use, especially if you
are loading millions or billions of records at a time.