Extraction, transformation, and loading's primary objective is to facilitate moving data between its source and destination while subjecting it to arbitrary modifications, and has been an integral part of the Microsoft SQL Server product line for a number of years.
In SQL Server 7.0 and 2000, such capabilities have been provided by Data Transformation Services (DTS). With the advent of SQL Server 2005, Microsoft decided that a wide range of ETL related innovations implemented in the new version justified its re-branding, resulting in the introduction of SQL Server Integration Services (SSIS). Its two successors incorporated, respectively, into SQL Server 2008 and SQL Server 2008 R2 did not bring revolutionary changes, although they included a number of interesting functional, scalability, and performance improvements. With the upcoming release of SQL Server 2012, Microsoft is planning to offer some more significant advancements in the areas of usability, productivity, and manageability. The purpose of this article is to present their high-level overview.
Note that our presentation is based on the current (as of December 2011) SQL Server 2012 Release Candidate 0, downloadable from the Microsoft Web site for both x86 and x64 platforms (as well as in the form of an .iso file containing both versions).
The installation process in either case is fairly straightforward. Assuming that all prerequisites are satisfied, on the Setup Role page of the SQL Server 2012 RC0 Setup wizard, you have the option to select either All Features with Defaults (which includes SQL Server Database Engine Services, Analysis Services, and Reporting Services with the most commonly used settings) or SQL Server Feature Installation (which allows you to designate the individual features you want to have available on the target computer).
Interestingly, you will not find Business Intelligence Development Studio in the Start->All Programs menu after the setup wizard successfully completes. Instead, Microsoft Visual Studio 2010 takes over as the primary Business Intelligence development tool.
After its initial launch, you will be prompted to choose default environment settings, with four available options: Business Intelligence Settings, General Development Settings, Visual Basic Development Settings, and Visual C# Development Settings. The first of them customizes the Integrated Development Environment interface in the manner geared towards working with SQL Server Business Intelligence products.
In addition to this fairly obvious change, as well as a number of purely cosmetic modifications, such as rounded corners of visual components (Control Flow tasks, containers, or Data Flow sources, destinations, and transforms) as well as new icons representing their intended purpose, you will also discover a wide range of new and improved features covering the following functional areas:
- Usability and productivity
- revamped (using Windows Presentation Foundation technology) Designer interface, including such features as Zoom Control (with auto-sizing capabilities) or auto-formatted annotations
- new Toolbox window, with items that can be arranged in a custom manner (and added to the Favorites node)
- extra buttons providing instant access to Toolbox and Variables windows,
- visual indicators associated with connection managers and variables (in the form of fx symbol in the lower right corner of their icons) of an expression being employed in their properties
- visual indicators of the offline status of connection managers (in the form of x subscript in the lower left corner of their icons). Such status can be manually set through the context-sensitive menu of individual connection managers.
- Undo and Redo editing actions
- enhanced Copy and Paste capabilities, which include dependencies (such as, connection managers or variables) when carried out across package boundaries
- simplified method for modifying scopes of variables in the form of Select New Scope window
- project auto-save and recovery
- ability to group Data Flow components for the purpose of hiding them from the display (to improve readability), without affecting their processing characteristics
- simplified way of creating grid Data Viewers directly from the context sensitive menu of the data flow
- easily accessible, context sensitive options providing assistance with SSIS development (such as links to code samples posted on CodePlex web site)
- introduction of Source and Destination Assistants, which provide guidance with creating new data sources and destinations along with the corresponding connection managers
- support for Visual Studio Tools for Applications 3.0 (it is a version upgrade comparing with SQL Server 2008 R2) and .NET 4.0 in the Script Task and Script Component
- shared Connection Managers created on the project level and accessible by all packages within that project (this provides additional benefits when utilizing cached lookups, since cached content can be leveraged across multiple packages)
- improved design-time metadata mapping algorithm, which no longer relies on numeric lineage IDs (although they are still used during runtime). Their responsibilities have been relegated to textual, relatively easy to interpret refId strings.
- increased flexibility in the manner Data Flow can be designed (referred to as Flexible Order of Authoring) since it becomes possible to edit components in an offline or error state (which, in turn, takes advantage of their ability to retain metadata cached from their most recent working configuration). This is one of the consequences of eliminating lineage IDs dependency during design time.
- new Control Flow tasks (such as Expression Task, which provides the ability to evaluate an expression at runtime)
- new Data Flow components (such as Data Quality Services (DQS) Cleansing transform)
- Deployment, configuration, and management
- introduction of parameters, which constitute replacement for package configurations used in earlier versions of SSIS. Implemented as read-only variables (and available via the Parameters tab in the Designer window) and scoped on the package or project (i.e. accessible by all packages within that project) level, they operate in a manner equivalent to input parameters of programming language functions and, just like them, can be defined as optional or required. (In the case of the former, the value assigned during design time can be used at runtime, while the latter has to be specified explicitly when the package is invoked). They can be supplied when launching packages via an Execute Package task or a SQL Server Agent job (as the SQL Server Integration Services Package step). A new context sensitive menu option (labeled Parameterize...) of tasks, variables, and connection managers considerably simplifies their configuration.
- ability to combine multi-package projects into a single unit (taking the form of a file with extension .ispac). This eliminates the possibility of breaking dependencies between parent and child packages when deploying them into different environments.
- possibility to designate an entry-point package (within a multi-package project) intended to be directly invoked (which can be subsequently easily identified prior to project execution),
- new recommended project deployment model - similar to earlier versions of SSIS, packages can be deployed either to the file system (matching the traditional approach) or to the SSIS Server. The latter, implemented as a SQL Server based construct (a user database with associated CLR stored procedures, referred to within the SQL Server Management Studio as a catalog and named by default SSISDB) now offers a number of advantages over its file-based counterpart. Some of the most relevant ones include the ability to centralize management of parameters or connection managers (through Configure window of the project entry displayed in the SQL Server Management Studio) or the option to assign arbitrary property values at execution, overriding their values provided during design time (in situations where they are not exposed via package or project level parameters). In addition, projects deployed to the SSIS Server (along with their parameters) can be automatically encrypted (the encryption key is protected with a password you specify when creating the catalog, instead of relying on error-prone, package-based protection levels) and access to them can be individually controlled.
- use of environments - applicable in scenarios where a package deployed to SSIS Server needs to be executed with different groups of parameter values. In such cases, you can define collections of variables (effectively forming such environments), which can be used to set package parameters. Afterwards, whenever the package is executed, you have an option of choosing one of these environments from the Run Package window within SQL Server Management Studio.
- versioning support (including the possibility of a rollback), based on SSIS Server keeping track of versions of projects deployed to it and allowing you to choose the one you want to invoke.
- support for PowerShell (in addition to SQL Server Management Object APIs) when storing, configuring, and running packages in SSIS Server,
- considerably improved reporting and troubleshooting - following package execution, you have an option of opening a dashboard window, which contains drilldown reports providing execution status, as well as performance and logging data (the logging level can be set to None, Basic, Performance, or Verbose depending on your requirements directly from the Run Package window). You also have options to create data taps (collecting all records flowing through a particular path of a package that can be used for analyzing data quality-related issues) and memory dumps (capturing content of memory at the point of a runtime error).
As you can see, the latest version of the SQL Server product line offers a wide range of new and improved features in the areas of usability, productivity, and manageability. In upcoming articles, we will be exploring them in more detail.
See all articles by Marcin Policht