In the previous
installment of our series covering SQL Server 2005 Integration Services, we
reviewed the innovative design of this technology and its impact on
performance. Within the context of the dual architecture (divided into Control
and Data Flows), we presented such concepts as parallelism and memory
management. In this article, we will discuss other topics related to improving
efficiency of package execution. Throughout this discussion, keep in mind,
however, that there are very few straightforward, hard rules in this category
(and a number of dependencies for those that do exist), so if you decide to try
any of the suggestions listed below, "your mileage might vary."
Following this quick disclaimer, let’s take a closer look at dependency
between the way packages are launched and their performance characteristics. As
we explained earlier,
there are two factors that play significant role here – whether the package
runs within the confines of SQL Server Business Intelligence Development Studio
and whether the debugging feature is turned on. While both are extremely
helpful during the development stage, assisting with design and
troubleshooting, as well as providing valuable data through breakpoints or data
viewers, they also introduce significant overhead. You can minimize their
negative impact or eliminate it altogether using any of the following methods:
-
Choose the "Start Without Debugging" entry in the Debug
top-level menu (or Ctrl+F5 key combination) when running file-system based
packages in the SSIS Designer of the Business Intelligence Development Studio.
In order to experience noticeable difference, you should also close (or at
least minimize) the watch and output windows. Keep in mind, however, that
execution time will still be affected by SQL Server Business Intelligence
Development Studio (and that’s why the other options listed below are
preferable when running packages or attempting to estimate their performance). -
Invoke execution with the DTExec.exe command line utility (located
in Program FilesMicrosoft SQL Server90DTSBinn folder), which can be applied
to packages stored in file system, msdb database, or SSIS Package Store. -
If you prefer a GUI-based interface, take advantage of Execute
Package Utility DTSExecUI.exe (located in the Program FilesMicrosoft SQL
Server90ToolsBinnVSShellCommon7IDE folder) and functionally equivalent to
DTExec.exe. This program is also launched after selecting the "Run
Package" option from the context sensitive menu of nodes representing packages
saved into SSIS Package Store or SQL Server in the Object Explorer of SQL
Server Management Studio. -
Schedule and launch a package via "SQL Server Integration
Services Package" step of a SQL Server Agent-based job. -
Whenever applicable, dynamically create and execute a package
with the "SSIS Import and Export Wizard…" entry in the Project top
level menu of Business Intelligence Development Studio or from the Command
Prompt by calling DTSWizard.exe program located in the Program FilesMicrosoft
SQL Server90DTSBinn folder. (Immediate execution is possible only when you
initiate the wizard using the second of these methods).
Review your SSIS logging options and restrict them strictly to information
you are interested in (this is managed from the Configure SSIS Logs window
containing granular settings defining individual components, events, providers,
and types of data to be recorded). Startup time of packages can be shortened by
having SSIS Service running (ensure this by having its startup configured as
automatic). This way, registration information of components available via the
SQL Server Business Intelligence Development Studio can be cached and
referenced during each subsequent launch.
Optimize OLE DB Source operations by using SQL commands (as Data access mode
in its Editor window) rather than using Table or view option (this not only
ensures that you only receive data you are not interested in, but also alters
retrieval mechanism, making it more efficient). Similarly, adjust queries used
by Lookup Data Flow transformation (which by default is based on SELECT * FROM
statement against reference table) or SQL Query-based source adapters. The performance
of lookups can be further improved by using full or partial cache mode
(configurable on the Component Properties tab in the Advanced Editor window),
which determines the amount of reference data set that is read before input
processing starts (providing you have a sufficient amount of memory available).
Look for opportunities to substitute OLE DB Command Data Flow
transformations (which processes each row passing through the pipeline) with
equivalent Execute SQL Control Flow tasks to create a desired recordset. When
running packages directly on the target SQL Server, replace OLE DB Destination
with SQL Server Destination adapter, since the latter executes in-process,
introducing significantly lower overhead. If this approach is not applicable
(i.e. you access remote target SQL Server via SQL Server OLE DB driver), try
increasing the efficiency of the OLE DB Destination adapter by choosing fast
load as the Data access mode from the Connection Manager section of the OLE DB
Destination Editor. Other settings available within the same window allow you
to fine-tune performance by turning on the table lock option or adjusting the maximum
insert commit size value.
When dealing with text files containing large amounts of date, time,
currency, or numeric data in the most common formats (without locale specific and
special currency characters, or missing and implied date parts), you can reach
superior processing speed (exceeding those provided by BULK INSERT or bcp
methods) by setting the value of Fast Parse property for Flat File Source and
Data Conversion Data Flow transformations to True. This needs to be done for
each individual output column on the Input and Output Properties tab of the
Advanced Editor window of the appropriate components.
If your package contains Script Task and Script Component elements, you can
decrease their startup time by assigning True to their
PrecompileScriptIntoBinaryCode and PreCompile properties (respectively). This
will however increase the package size and prevent it from being debugged
(since the code is complied prior to run time), so you should first thoroughly
verify its functionality and integrity. Note that precompilation is required
when using these components on a 64-bit platform.
Take into consideration external factors that could improve response time. For
example, it might be beneficial to alter the SQL Server database logging
options by setting a suitable recovery model (such as Simple or Bulk-logged) or
disable relevant indexes (this new feature introduced in SQL Server 2005 is a
useful alternative to dropping an index, deactivating it and de-allocating its
storage but preserving its metadata) prior to massive insert and modification
operations (applicability of this approach would depend to large extent on the
time required to rebuild the index following the change). Whenever possible,
take advantage of existing indexes to speed up searches or sorts. Order your
data prior to reading it into the DataReader Source adapter to eliminate the
need for using fully-blocking Sort transformation. To indicate it, you need to
set to True the IsSorted property of the DataReader Output node in the Input
and Output Properties tab of Advanced Editor of the transformation. In
addition, it is necessary to designate sorted output columns by assigning
appropriate integer values to the SortKeyPosition property for each sorted
output column in the OutputColumns list (within the same Advanced Editor
window).
Keep in mind that SSIS operates independently from the SQL Server 2005
Database Engine – including their internal resource management mechanisms,
which means that, when running on the same system, both compete for memory and
processor utilization. This can be addressed to some extent by restricting the amount
of memory assigned to SQL Server (with the sp_configure ‘max server memory’
server option), upgrade to more powerful 64-bit hardware and software (with an
operating system capable of allocating to applications much more than 2 GB – or
3GB with AWE enabled – limit imposed by 32-bit environment), or resolved by installing
Integration Services and Database Engine on separate computers (although in
such a case, you need to keep in mind the impact of data traversing network
connection between them).
If a hardware upgrade is not possible, try to optimize disk access. While
you might not be able to prevent offloading memory buffers that are utilized by
the pipeline (for more information on this subject, refer to our previous
article) to the swap file, you can redirect them to faster or less utilized
disks by assigning appropriate values to BufferTempStoragePath and BLOBTempStoragePath
properties of a Data Flow task (which by default redirect the buffer content to
the location pointed to by the TEMP environment variable, within the profile
folder of the user who launches the package). The first one of them determines
disk location intended for generic buffer data and the second one is designated
specifically for storing content of columns with binary large objects (BLOB)
data. Develop a strategy (e.g. by implementing configurations) that will keep
these settings consistent (since they are configured on per Data Flow task
level) across all your packages.