SQL Server 2005 Integration Services - Performance - Part 31
August 28, 2006
In the previous installment of our series covering SQL Server 2005 Integration Services, we reviewed the innovative design of this technology and its impact on performance. Within the context of the dual architecture (divided into Control and Data Flows), we presented such concepts as parallelism and memory management. In this article, we will discuss other topics related to improving efficiency of package execution. Throughout this discussion, keep in mind, however, that there are very few straightforward, hard rules in this category (and a number of dependencies for those that do exist), so if you decide to try any of the suggestions listed below, "your mileage might vary."
Following this quick disclaimer, let's take a closer look at dependency between the way packages are launched and their performance characteristics. As we explained earlier, there are two factors that play significant role here - whether the package runs within the confines of SQL Server Business Intelligence Development Studio and whether the debugging feature is turned on. While both are extremely helpful during the development stage, assisting with design and troubleshooting, as well as providing valuable data through breakpoints or data viewers, they also introduce significant overhead. You can minimize their negative impact or eliminate it altogether using any of the following methods:
Review your SSIS logging options and restrict them strictly to information you are interested in (this is managed from the Configure SSIS Logs window containing granular settings defining individual components, events, providers, and types of data to be recorded). Startup time of packages can be shortened by having SSIS Service running (ensure this by having its startup configured as automatic). This way, registration information of components available via the SQL Server Business Intelligence Development Studio can be cached and referenced during each subsequent launch.
Optimize OLE DB Source operations by using SQL commands (as Data access mode in its Editor window) rather than using Table or view option (this not only ensures that you only receive data you are not interested in, but also alters retrieval mechanism, making it more efficient). Similarly, adjust queries used by Lookup Data Flow transformation (which by default is based on SELECT * FROM statement against reference table) or SQL Query-based source adapters. The performance of lookups can be further improved by using full or partial cache mode (configurable on the Component Properties tab in the Advanced Editor window), which determines the amount of reference data set that is read before input processing starts (providing you have a sufficient amount of memory available).
Look for opportunities to substitute OLE DB Command Data Flow transformations (which processes each row passing through the pipeline) with equivalent Execute SQL Control Flow tasks to create a desired recordset. When running packages directly on the target SQL Server, replace OLE DB Destination with SQL Server Destination adapter, since the latter executes in-process, introducing significantly lower overhead. If this approach is not applicable (i.e. you access remote target SQL Server via SQL Server OLE DB driver), try increasing the efficiency of the OLE DB Destination adapter by choosing fast load as the Data access mode from the Connection Manager section of the OLE DB Destination Editor. Other settings available within the same window allow you to fine-tune performance by turning on the table lock option or adjusting the maximum insert commit size value.
When dealing with text files containing large amounts of date, time, currency, or numeric data in the most common formats (without locale specific and special currency characters, or missing and implied date parts), you can reach superior processing speed (exceeding those provided by BULK INSERT or bcp methods) by setting the value of Fast Parse property for Flat File Source and Data Conversion Data Flow transformations to True. This needs to be done for each individual output column on the Input and Output Properties tab of the Advanced Editor window of the appropriate components.
If your package contains Script Task and Script Component elements, you can decrease their startup time by assigning True to their PrecompileScriptIntoBinaryCode and PreCompile properties (respectively). This will however increase the package size and prevent it from being debugged (since the code is complied prior to run time), so you should first thoroughly verify its functionality and integrity. Note that precompilation is required when using these components on a 64-bit platform.
Take into consideration external factors that could improve response time. For example, it might be beneficial to alter the SQL Server database logging options by setting a suitable recovery model (such as Simple or Bulk-logged) or disable relevant indexes (this new feature introduced in SQL Server 2005 is a useful alternative to dropping an index, deactivating it and de-allocating its storage but preserving its metadata) prior to massive insert and modification operations (applicability of this approach would depend to large extent on the time required to rebuild the index following the change). Whenever possible, take advantage of existing indexes to speed up searches or sorts. Order your data prior to reading it into the DataReader Source adapter to eliminate the need for using fully-blocking Sort transformation. To indicate it, you need to set to True the IsSorted property of the DataReader Output node in the Input and Output Properties tab of Advanced Editor of the transformation. In addition, it is necessary to designate sorted output columns by assigning appropriate integer values to the SortKeyPosition property for each sorted output column in the OutputColumns list (within the same Advanced Editor window).
Keep in mind that SSIS operates independently from the SQL Server 2005 Database Engine - including their internal resource management mechanisms, which means that, when running on the same system, both compete for memory and processor utilization. This can be addressed to some extent by restricting the amount of memory assigned to SQL Server (with the sp_configure 'max server memory' server option), upgrade to more powerful 64-bit hardware and software (with an operating system capable of allocating to applications much more than 2 GB - or 3GB with AWE enabled - limit imposed by 32-bit environment), or resolved by installing Integration Services and Database Engine on separate computers (although in such a case, you need to keep in mind the impact of data traversing network connection between them).
If a hardware upgrade is not possible, try to optimize disk access. While you might not be able to prevent offloading memory buffers that are utilized by the pipeline (for more information on this subject, refer to our previous article) to the swap file, you can redirect them to faster or less utilized disks by assigning appropriate values to BufferTempStoragePath and BLOBTempStoragePath properties of a Data Flow task (which by default redirect the buffer content to the location pointed to by the TEMP environment variable, within the profile folder of the user who launches the package). The first one of them determines disk location intended for generic buffer data and the second one is designated specifically for storing content of columns with binary large objects (BLOB) data. Develop a strategy (e.g. by implementing configurations) that will keep these settings consistent (since they are configured on per Data Flow task level) across all your packages.