The primary purpose of SQL Server 2008 Integration Services is to deliver extraction, transformation, and loading (ETL) functionality that supports core features of the SQL Server database management system. However, there are numerous scenarios where its role involves interaction with components that provide business intelligence-related features incorporated into the product. In this article, we will discuss one of these scenarios, which leverages a built-in SSIS Control Flow task to facilitate Analysis Services processing.
Let’s start by briefly reviewing the most fundamental concepts of the Analysis Services, since understanding them is essential when exploring characteristics of corresponding SSIS tasks. In the most basic terms, this technology constitutes the cornerstone of business intelligence by focusing on deriving meaningful information from relational data hosted by a SQL Server database engine. This is accomplished by employing online analytical processing (OLAP) and data mining methods, which rely on creating and manipulating multidimensional (rather than traditional, two-dimensional) structures formed by combining data from a number of related sources.
These structures are referred to as cubes and consist of so-called measures (the term that refers to, most commonly, numeric values, which quantify events that need to be analyzed) and dimensions (providing context for these events and corresponding to categories according to which data will be aggregated). Dimensions are described by their attributes (mirroring the relationship between normalized tables and their columns). To follow the geometrical analogy, if we were to transpose database tables into a cube (and assume, for the sake of simplicity, that we are dealing with three dimensions only), measurements would be equivalent to values within its individual cells and dimensions would be represented by its length, width and height (if you have problems visualizing this, refer to MSDN Library).
An example of such an arrangement, which we will be using in our demonstration, is provided by the AdventureWorksDW2008R2 sample database, where sales amounts or order quantities represent measures, while customers, geographical locations or dates constitute dimensions. In order to optimize analysis (by the client) and data processing (on the SQL Server side), multiple, related measures (as well as their aggregates, arranged based on arbitrarily chosen dimensions) are combined into measure groups, while attributes of dimensions form hierarchies.
Partitions function as storage units for measure groups, with at least one partition allocated to each group. By creating multiple partitions, with all of them referencing the same measure group, you can not only place them on separate disks to facilitate concurrent physical access, but also assign a distinct aggregation to each to maximize query performance, since relevant data can be calculated in advance rather than at runtime.
The final two terms that require a brief introduction are mining structure and mining model. The first of them defines data domain (in essence, identifying its schema), while the latter focuses on its processing, in particular, on algorithms that can be applied to data domain in order to properly analyze it.
This fairly complex nature of Analysis Services is reflected by their composition and management methodology. More specifically, the functional objectives are captured in the form of a project, created and modified using Business Intelligence Development Studio, while the underlying data is hosted by the database engine, instantiated from the project and administered with SQL Server Management Studio. This dual nature has a direct impact on the operational aspects of business intelligence lifecycle. In the context of our presentation, one such aspect we are interested in is Analysis Services processing.
The term processing in reference to Analysis Services technology denotes activities that involve synchronization of its constructs, such as cubes (with its measure groups and partitions), dimensions, or mining models (including their mining structures) with underlying data stores. This activity – similar to standard database maintenance procedures – is critical to optimal data mining and OLAP operations. While the strategy regarding choices of specific objects to be processed or the manner in which such processing should be conducted is beyond the scope of this article, what’s important to us is the knowledge that these actions, as well as a number of others, can be controlled via SSIS Analysis Services Processing task.
In order to further explore characteristics of the SSIS task, let’s step through an example of configuring it against one of data warehousing samples available from the CodePlex web site. Assuming that Analysis Services is already installed on the computer hosting your SQL Server 2008 R2 database engine, download and launch AdventureWorks2008R2_SR1.exe self-extracting executable. This will trigger SQL Server 2008 R2 SR1 Database Installer, presenting you with the list of sample databases to install, inlcuding AdventureWorks Data Warehouse 2008R2 and AdventureWorks OLTP. After the selected ones are copied to a folder you designate (defaulting to Program FilesMicrosoft SQL Server100ToolsSamples), attach AdventureWorksDW2008R2 to the target instance, which can be easily accomplished by invoking Attach… option from the context sensitive menu of Database node in Object Explorer in SQL Server Management Studio. At this point, you are ready to deploy the corresponding Analysis Services solution.
To accomplish this, launch SQL Server Business Intelligence Development Studio and select Open->Project/Solution… from the File menu. In the resulting dialog box, point to the Adventure Works.sln file located under Program FilesMicrosoft SQL Server100ToolsSamplesAdventureWorks 2008R2 Analysis Services Project directory in either Standard or Enterprise subdirectory, depending on the edition of SQL Server instance you are running. After the solution is loaded, navigate to Data Sources folder in the Solution Explorer window and double click on the Adventure Works.ds node. This will display Data Source Designer interface, which gives you ability to adjust connection string by clicking on Edit… command button and assigning the target server name (use Test Connection command button to verify your settings). Confirm your choices and once you are back to the Designer main window, set the target server property on the Property Page in the solution’s Properties window. Finally, right-click on the top level node in the Solution Explorer window and select Deploy Solution from the context sensitive menu.
With Analysis Services project in place, now it’s time to create and configure SSIS task. In the Business Intelligence Development Studio, add a new project based on the Integration Services template to the existing solution (containing the Adventure Works DW 2008 project). Drag Analysis Services Processing task from Toolbox and drop it on the Control Flow tab of the Designer interface. Invoke its Editor by selecting Edit… option from its context-sensitive menu. Switch to the Processing Settings section. Define Analysis Services connection manager by clicking on the New… command button. Note that you have two choices. The first one targets the computer running Analysis Services, while the second relies on a connection to an Analysis Services project in the same solution. In our example, you can pick either one, assuming that you are using the same SQL Server instance to which you already deployed this solution.
Connection manager gives you access to Analysis Services objects defined in the project. By clicking on the Add… command button, you can designate those you intend to process along with the processing options for each. The former includes all entities defined above (database, cubes, partitions, measure groups, dimensions, mining models, and mining structures). The choice of the latter depends on the object type and includes a combination of Process Full, Process Default, Unprocess, Process Update, Process Data, Process Index, Process Structure, Process Clear Structure, Process Incremental, and Process Script Cache. For more information on this subject, refer to Processing Options and Settings article in SQL Server 2008 R2 Books Online.
For each selected object, you can determine extent of processing activity, which ultimately depends on the object’s current state, by clicking on Impact Analysis… command button. You also have ability to alter the way entire task will be handled by clicking on Change Settings… command button and modifying entries on the Processing options and Dimension key errors tabs. The first of them gives you a choice between processing all objects in parallel (as a single transaction, with arbitrarily assigned degree of parallelism) or sequentially (either in one or separate transactions), as well as allows you to decide how to manage writeback table (facilitating changes to a measure group in a cube) and to force processing of all dependent objects. The second contains an option for switching between default and custom error configuration, where the latter offers such customizations as one of two Key error actions (Convert to unknown or Discard record), an arbitrary Processing error limit, actions to be triggered by specific error conditions (such as Key not found, Duplicate key, Null key converted to unknown, and Null key not allowed), as well as designating an Error log path.
While configuration of the task requires dedicating some time to understanding of Analysis Services Processing-related settings, it yields significant benefits in regard to simplifying maintenance procedures and ensuring that appropriate objects are properly updated. This is particularly useful if you transform and load warehouse database by leveraging Integration Services technology, where data import needs to be followed by a corresponding processing action.