New and Improved Functionality in SSIS 2008


While it does not bring revolutionary changes, the release of SQL Server 2008 delivers functionality, scalability, and performance improvements to SQL Server Integration Services (SSIS). Here is a comprehensive listing of new and enhanced features of SSIS with a short description of each.

SQL Server Integration Services (SSIS)
is the primary Extraction, Transformation, and Loading mechanism incorporated
into the Microsoft database management platform. While its origins go back as
far as SQL Server 7, its first implementation named Data Transformation
Services (DTS), offered only rudimentary ETL capabilities. With the release of
SQL Server 2005, Microsoft delivered its considerably more powerful successor,
with a range of new and enhanced features broad enough to justify rebranding
the product to its current name. The version included in SQL Server 2008 does
not bring equally revolutionary changes, but its functional, scalability, and
performance improvements warrant at the very least this short overview. Here is
their comprehensive listing with a short description of each. (You can find
their more detailed coverage by referring to articles dedicated to SQL Server
2008 Integration Services-related topics published on this Web site):

  • Connections Project Wizard – a new
    feature, assisting with an initial setup of a package (as well as Data Flow components)
    when creating new Integration Services projects in the Business Intelligence
    Development Studio.
    It is triggered automatically by selecting the File->New->Project… menu
    option with Integration
    Services Connections Project as its template. It guides you through
    selecting and configuring data providers and their corresponding
    connection managers, as well as designating each as either a source or a
    destination (which are subsequently incorporated into an autocreated Data Flow Task).
  • Import and Export Wizard – an
    enhanced feature, whose superiority comes from an innovative way data type
    mappings and conversions are handled. Unlike in its previous version
    (included in SQL Server 2005 Integration Services), data type conversions
    are no longer implicit, but instead all changes are clearly identified
    (via the Review
    Data Type Mapping
    page and its Column
    Conversion Details
    popup window), giving you an option to reject them (preventing the
    possibility of unintentional data loss). In addition, for each pair of corresponding
    source and destination columns, you have the ability to specify the action
    to be carried out (Fail or Ignore) in
    response to data truncation and error condition. Data type mappings and
    conversion behavior are determined based on sets of rules stored in
    XML-formatted configuration files (located respectively in Program FilesMicrosoft
    SQL Server100DTSMappingFiles and Program FilesMicrosoft SQL
    Server100DTSbinn
    folders). The process is further simplified by visual clues in the form of
    icons associated with each pair of source and destination columns:
    • green checkmark conveys successful
      outcome (if the conversion is guaranteed not to cause a data loss or is
      not needed altogether, which you can determine by viewing the state of
      the Convert checkbox)
    • yellow exclamation mark means that
      a conversion does take place and its processing should be reviewed (in
      order to establish whether realistic possibility of data loss exists).
    • red circle with a cross is a sign
      that existing data cannot be converted (which effectively will prevent
      you from executing the resulting package until this issue is resolved).
  • Package Upgrade Wizard – a new
    feature, accessible from the Project->Upgrade All Packages menu item
    in the Business

    Intelligence Development Studio, guides you through bulk upgrades
    to packages created using the SQL Server 2005 version of Integration
    Services (regardless of their storage type). While running the wizard is
    relatively straightforward (it involves primarily specifying the source
    location along with a few optional configuration steps, such as updating
    connection strings to account for new provider names, validating upgraded
    packages, or creating new package IDs), you might want to consider using Upgrade Advisor from the
    SQL Server 2008 distribution media in order to identify any potential
    issues that might surface during such upgrades. (Note that the wizard
    offers you an option to back up original packages as part of the upgrade
    process).

  • Control Flow Data Profiling Task and Data Profile Viewer – a new
    task provides the ability to capture data processed by the SSIS engine
    based on predefined sets of criteria geared towards identifying patterns
    in distribution of its values as well as any inconsistencies that deviate
    from such patterns (which frequently are the primary reasons for package
    runtime errors). More specifically, they allow you, for arbitrarily
    selected columns, to identify such characteristics as their length or
    value distribution, maximums and minimums, number of NULLs, or level
    of conformance to regular expression-based patterns), as well as
    inter-column dependencies (including their suitability to function as
    primary or foreign keys). The criteria sets are exposed in the Data Profiling Task
    Editor
    interface (as Profile
    Type
    in the Profile
    Requests
    section). Once a capture is completed (resulting in an .XML-formatted
    file), its content can be analyzed with the Data Profile Viewer applet
    (implemented as the DataProfiler.exe executable
    residing in the Program
    FilesMicrosoftSQL Server100DTSBinn folder and accessible via Data Profile Viewer shortcut
    in the Start->All
    Programs->Microsoft SQL Server 2008->Integration Services menu).
  • Data Flow Lookup Transformation, Cache Transfromation, and Cache Connection
    Manager

    – a mix of new and improved features that considerably enhance the
    peformance benefits of caching, by extending its scope and functionality.
    While caching of lookup operations was possible in the previous version of
    SQL Server Integration Services as part of the Lookup Transformation feature
    set, it was limited to its internal operations. Now, lookups can take
    place within a separate data flow (or even package), typically by
    utilizing a combination of a Cache Transformation and Cache Connection
    Manager,
    persist in memory (by default) or a designated file system-based cache
    (which takes the form of a .CAW file), and be referenced by a Lookup Transformation operating
    in the full cache mode. Additional improvement comes from negative
    caching, which includes rows for which there are no matches in the
    reference dataset. It is also worth mentioning that the Lookup Transformation contains
    an extra output (Lookup
    No Match Output),
    which isolates failed lookups from errors (they were combined with errors
    in its previous version of SSIS).
  • Script Task and Script Component
    scripting environment is no longer based on the Visual Studio for
    Applications (VSA), but instead it utilizes the Visual Studio Tools for
    Applications (VSTA) engine. Effectively, both Data Control’s Script Task and Data Flow’s Script Component (through
    which access to the VSTA is provided) support both C# and VB.NET
    programming languages (in the SQL Server 2005 implementation of
    Integration Services, only the latter was available). In addition, as the
    result of this change, the process of referencing custom .NET assemblies
    has been simplified (as described in the MSDN Referencing
    Other Assemblies in Scripting Solutions SQL Server 2008 Books Online article)
    compared with a fairly cumbersome workaround required to accomplish the
    same goal in the previous version of SSIS (as described in the
    equivalent article
    of SQL Server 2005 Books Online). Furthermore, it
    is now possible to take advantage of objects and methods implemented via
    Web Services (this functionality was not available before).
  • ADO.NET Source and ADO.NET Destination – a pair
    of new Data
    Flow
    components facilitate access to any data store supported by .NET Framework
    provider (including ODBC and OLE DB). They are most commonly used when
    working with Script
    Tasks
    or when an equivalent OLE DB provider-based access is not possible. ADO.NET Source
    constitutes a replacement of the DataReader Source in SQL Server 2005
    Integration Services and, similarly, ADO.NET Destination takes on
    the role of DataReader
    Destination
    (even though the latter remains available in the current version).
  • Data Flow engine and parallelism in the processing
    of execution trees
    – execution trees are no longer are limited to
    individual threads, but instead, each (depending on the availability of
    processing resources) can be subdivided into smaller execution units
    (called subpaths), which
    effectively extends the scope of parallelism and contributes to shorter
    execution time (on multiprocessor systems).
  • DTExec.exe and package configurations – due
    to changes in the way package configuration is applied (compared with SQL
    Server 2005 Integration Services), it is now possible to set some of its
    settings (such as connection strings) at runtime from the command line
    (when invoking the execution via DTExec.exe utility). For more information
    about this functionality, refer to the Package
    Configuration
    article of SQL Server 2008 Books Online.
  • DT_DBTIME2, DT_DBTIMESTAMP2, and DT_DBTIMESTAMPOFFSET data types
    – facilitate data type mapping to equivalent T-SQL date/time data types
    introduced in SQL Server 2008. Their primary purpose is to provide support
    for more accurate time measurements (such as fractional seconds and the ability
    to account for time zone offset).

I hope that this short overview will
raise your interest in the SQL Server 2008 Integration Services sufficiently to
explore each of the functions in more detail.

»


See All Articles by Columnist

Marcin Policht

Marcin Policht
Marcin Policht
Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

Latest Articles