New and Improved Functionality in SSIS 2008

March 30, 2010

While it does not bring revolutionary changes, the release of SQL Server 2008 delivers functionality, scalability, and performance improvements to SQL Server Integration Services (SSIS). Here is a comprehensive listing of new and enhanced features of SSIS with a short description of each.

SQL Server Integration Services (SSIS) is the primary Extraction, Transformation, and Loading mechanism incorporated into the Microsoft database management platform. While its origins go back as far as SQL Server 7, its first implementation named Data Transformation Services (DTS), offered only rudimentary ETL capabilities. With the release of SQL Server 2005, Microsoft delivered its considerably more powerful successor, with a range of new and enhanced features broad enough to justify rebranding the product to its current name. The version included in SQL Server 2008 does not bring equally revolutionary changes, but its functional, scalability, and performance improvements warrant at the very least this short overview. Here is their comprehensive listing with a short description of each. (You can find their more detailed coverage by referring to articles dedicated to SQL Server 2008 Integration Services-related topics published on this Web site):

  • Connections Project Wizard - a new feature, assisting with an initial setup of a package (as well as Data Flow components) when creating new Integration Services projects in the Business Intelligence Development Studio. It is triggered automatically by selecting the File->New->Project... menu option with Integration Services Connections Project as its template. It guides you through selecting and configuring data providers and their corresponding connection managers, as well as designating each as either a source or a destination (which are subsequently incorporated into an autocreated Data Flow Task).
  • Import and Export Wizard - an enhanced feature, whose superiority comes from an innovative way data type mappings and conversions are handled. Unlike in its previous version (included in SQL Server 2005 Integration Services), data type conversions are no longer implicit, but instead all changes are clearly identified (via the Review Data Type Mapping page and its Column Conversion Details popup window), giving you an option to reject them (preventing the possibility of unintentional data loss). In addition, for each pair of corresponding source and destination columns, you have the ability to specify the action to be carried out (Fail or Ignore) in response to data truncation and error condition. Data type mappings and conversion behavior are determined based on sets of rules stored in XML-formatted configuration files (located respectively in Program Files\Microsoft SQL Server\100\DTS\MappingFiles and Program Files\Microsoft SQL Server\100\DTS\binn folders). The process is further simplified by visual clues in the form of icons associated with each pair of source and destination columns:
    • green checkmark conveys successful outcome (if the conversion is guaranteed not to cause a data loss or is not needed altogether, which you can determine by viewing the state of the Convert checkbox)
    • yellow exclamation mark means that a conversion does take place and its processing should be reviewed (in order to establish whether realistic possibility of data loss exists).
    • red circle with a cross is a sign that existing data cannot be converted (which effectively will prevent you from executing the resulting package until this issue is resolved).
  • Package Upgrade Wizard - a new feature, accessible from the Project->Upgrade All Packages menu item in the Business Intelligence Development Studio, guides you through bulk upgrades to packages created using the SQL Server 2005 version of Integration Services (regardless of their storage type). While running the wizard is relatively straightforward (it involves primarily specifying the source location along with a few optional configuration steps, such as updating connection strings to account for new provider names, validating upgraded packages, or creating new package IDs), you might want to consider using Upgrade Advisor from the SQL Server 2008 distribution media in order to identify any potential issues that might surface during such upgrades. (Note that the wizard offers you an option to back up original packages as part of the upgrade process).
  • Control Flow Data Profiling Task and Data Profile Viewer - a new task provides the ability to capture data processed by the SSIS engine based on predefined sets of criteria geared towards identifying patterns in distribution of its values as well as any inconsistencies that deviate from such patterns (which frequently are the primary reasons for package runtime errors). More specifically, they allow you, for arbitrarily selected columns, to identify such characteristics as their length or value distribution, maximums and minimums, number of NULLs, or level of conformance to regular expression-based patterns), as well as inter-column dependencies (including their suitability to function as primary or foreign keys). The criteria sets are exposed in the Data Profiling Task Editor interface (as Profile Type in the Profile Requests section). Once a capture is completed (resulting in an .XML-formatted file), its content can be analyzed with the Data Profile Viewer applet (implemented as the DataProfiler.exe executable residing in the Program Files\Microsoft\SQL Server\100\DTS\Binn folder and accessible via Data Profile Viewer shortcut in the Start->All Programs->Microsoft SQL Server 2008->Integration Services menu).
  • Data Flow Lookup Transformation, Cache Transfromation, and Cache Connection Manager - a mix of new and improved features that considerably enhance the peformance benefits of caching, by extending its scope and functionality. While caching of lookup operations was possible in the previous version of SQL Server Integration Services as part of the Lookup Transformation feature set, it was limited to its internal operations. Now, lookups can take place within a separate data flow (or even package), typically by utilizing a combination of a Cache Transformation and Cache Connection Manager, persist in memory (by default) or a designated file system-based cache (which takes the form of a .CAW file), and be referenced by a Lookup Transformation operating in the full cache mode. Additional improvement comes from negative caching, which includes rows for which there are no matches in the reference dataset. It is also worth mentioning that the Lookup Transformation contains an extra output (Lookup No Match Output), which isolates failed lookups from errors (they were combined with errors in its previous version of SSIS).
  • Script Task and Script Component - scripting environment is no longer based on the Visual Studio for Applications (VSA), but instead it utilizes the Visual Studio Tools for Applications (VSTA) engine. Effectively, both Data Control's Script Task and Data Flow's Script Component (through which access to the VSTA is provided) support both C# and VB.NET programming languages (in the SQL Server 2005 implementation of Integration Services, only the latter was available). In addition, as the result of this change, the process of referencing custom .NET assemblies has been simplified (as described in the MSDN Referencing Other Assemblies in Scripting Solutions SQL Server 2008 Books Online article) compared with a fairly cumbersome workaround required to accomplish the same goal in the previous version of SSIS (as described in the equivalent article of SQL Server 2005 Books Online). Furthermore, it is now possible to take advantage of objects and methods implemented via Web Services (this functionality was not available before).
  • ADO.NET Source and ADO.NET Destination - a pair of new Data Flow components facilitate access to any data store supported by .NET Framework provider (including ODBC and OLE DB). They are most commonly used when working with Script Tasks or when an equivalent OLE DB provider-based access is not possible. ADO.NET Source constitutes a replacement of the DataReader Source in SQL Server 2005 Integration Services and, similarly, ADO.NET Destination takes on the role of DataReader Destination (even though the latter remains available in the current version).
  • Data Flow engine and parallelism in the processing of execution trees - execution trees are no longer are limited to individual threads, but instead, each (depending on the availability of processing resources) can be subdivided into smaller execution units (called subpaths), which effectively extends the scope of parallelism and contributes to shorter execution time (on multiprocessor systems).
  • DTExec.exe and package configurations - due to changes in the way package configuration is applied (compared with SQL Server 2005 Integration Services), it is now possible to set some of its settings (such as connection strings) at runtime from the command line (when invoking the execution via DTExec.exe utility). For more information about this functionality, refer to the Package Configuration article of SQL Server 2008 Books Online.
  • DT_DBTIME2, DT_DBTIMESTAMP2, and DT_DBTIMESTAMPOFFSET data types - facilitate data type mapping to equivalent T-SQL date/time data types introduced in SQL Server 2008. Their primary purpose is to provide support for more accurate time measurements (such as fractional seconds and the ability to account for time zone offset).

I hope that this short overview will raise your interest in the SQL Server 2008 Integration Services sufficiently to explore each of the functions in more detail.

» See All Articles by Columnist Marcin Policht

The Network for Technology Professionals



Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers