SHARE

SQL Server 2008 and 2008 R2 Integration Services – Partition Processing Data Flow Destination

Written By

Jan 4, 2012

6 minute read

The primary purpose of Integration Services technology is to deliver extraction, transformation, and loading functionality that supports core features of SQL Server database management systems. However, there are numerous scenarios where its role involves interaction with components that provide Business Intelligence-related features. In this article, we will discuss one such scenario, which leverages built-in SSIS Partition Processing Data Flow Destination in order to facilitate Analysis Services processing.

Let’s start by briefly reviewing Analysis Services concepts that play an essential role in our presentation. Arguably the most essential component of the mutlidimentional data model is a cube, structured in a manner optimized towards its Business Intelligence objectives. Cubes consist of measures (most commonly, numeric values, which quantify events that need to be analyzed) and dimensions (providing context for these events and corresponding to categories according to which data is aggregated). Dimensions are described by their attributes.

An example illustrating this paradigm (which we will employ in our demonstration) is provided by the AdventureWorksDW2008R2 sample data warehouse database, where sales amounts or order quantities represent measures, while customers, geographical locations, or dates constitute dimensions.

In order to optimize analysis (by the client) and data processing (on the SQL Server side), multiple, related measures (including their aggregates, arranged based on appropriately chosen dimensions) are combined into measure groups, while attributes of dimensions form hierarchies. Partitions function as storage units for measure groups, with at least one partition allocated to each group. By creating multiple partitions (with all of them referencing the same measure group), you can not only place them on separate disks (facilitating concurrent physical access) but also assign a distinct aggregation to each to maximize query performance (since relevant data can be calculated in advance rather than at runtime).

The term processing in the context of Analysis Services technology denotes activities that involve synchronization of its constructs, such as cubes (with its measure groups and partitions), dimensions, or mining models with underlying data stores. This activity is critical to ensure accuracy and efficiency of data mining and OLAP operations. When dealing with multidimensional OLAP (referred to simply as MOLAP) storage modes, some of the relational data is loaded into a corresponding partition within the target cube. SSIS Partition Processing Data Flow Destination helps automate this functionality.

In order to demonstrate its implementation, we will leverage predefined Analysis Services structures referencing data representing business operations of AdventureWorks – a ficticious company manufacturing and selling bicycles (with sample databases downloadable from the CodePlex web site, where you will also find a detailed procedure describing their deployment). The resulting Analysis Services installation includes the Internet Customers measure group with four predefined partitions (Customers_2005 through Customers_2008). In our example, we will emulate processing of the Customers_2008 partition.

To examine its properties, select the View Designer entry from the context sensitive menu of the Adventure Works.cube node in the Solution Explorer window of Business Intelligence Development Studio. Once the Designer window is displayed, switch to its Partitions tab, expand the Internet Customers section, and click on the ellipsis button in the Source column of the Customers_2008 row. You will find there the T-SQL query that extracts relevant data from the underlying table in the AdventureWorksDW2008R2 database (via Adventure Works DW data source), which takes the following format (this will serve as the basis for the OLE DB Source component in the SSIS package we are about to create):

SELECT                [dbo].[FactInternetSales].[ProductKey],[dbo].[FactInternetSales].[OrderDateKey],
                [dbo].[FactInternetSales].[DueDateKey],[dbo].[FactInternetSales].[ShipDateKey], 
                [dbo].[FactInternetSales].[CustomerKey], [dbo].[FactInternetSales].[PromotionKey],
                [dbo].[FactInternetSales].[CurrencyKey],[dbo].[FactInternetSales].[SalesTerritoryKey],
                [dbo].[FactInternetSales].[SalesOrderNumber],[dbo].[FactInternetSales].[SalesOrderLineNumber],
                [dbo].[FactInternetSales].[RevisionNumber],[dbo].[FactInternetSales].[OrderQuantity],
                [dbo].[FactInternetSales].[UnitPrice],[dbo].[FactInternetSales].[ExtendedAmount],
                [dbo].[FactInternetSales].[UnitPriceDiscountPct],[dbo].[FactInternetSales].[DiscountAmount],        
                [dbo].[FactInternetSales].[ProductStandardCost],[dbo].[FactInternetSales].[TotalProductCost],
                [dbo].[FactInternetSales].[SalesAmount],[dbo].[FactInternetSales].[TaxAmt],
                [dbo].[FactInternetSales].[Freight],[dbo].[FactInternetSales].[CarrierTrackingNumber],
                [dbo].[FactInternetSales].[CustomerPONumber]
FROM [dbo].[FactInternetSales]
WHERE OrderDateKey >= '20080101' AND OrderDateKey <= '20081231'

With all of these prerequisites in place, we are ready to start our sample implementation (note that even though, for the sake of simplicity, we employ a conceptual shortcut by leveraging an existing – and already populated – data warehouse database, the procedure described below would remain practically unchanged if we used another relational data source).

In the Business Intelligence Development Studio, create a new project based on the Integration Services template. Switch to the Data Flow tab and activate the hyperlink displayed in the center of the main window stating ‘No Data Flow task have been added to this package’. Click here to add a new Data Flow task.

With the Data Flow task in place and active, drag an OLE DB Source component from the Toolbox onto the Designer interface. Display its Editor window by selecting the Edit… item from its context-sensitive menu. In the Connection Manager section, click on the New… command button and point to the target instance of the AdventureWorksDW2008R2 database. After you have defined a new OLE DB connection manager, ensure that the SQL command entry appears in the Data access mode listbox, and copy the T-SQL SELECT statement listed above into the SQL command text window. Preview the input data to verify that your settings are correct and close the Editor window.

Next, drag the Partition Processing Destination from the Toolbox onto the Designer window; position it directly underneath the OLE DB Source component, and extend the Data Flow Path so the two of them become connected. Launch the Partition Processing Destination Editor using the Edit… context sensitive menu entry. Click on the New… command button to create an Analysis Services Connection Manager for the target cube (containing our designated partition), which resides either on an Analysis Services instance or is part of a solution including the Analysis Services project. Select the Customers_2008 entry along with the appropriate Processing method (depending on your requirements, you can choose Add (incremental), Full, or Data only), and switch to the Mappings section, where you need to match input columns (derived from the OLE DB Source, pointing in our case to the dbo.FactInternetSales table in the AdventureWorksDW2008R2 data warehouse database) with equivalent partition columns:

CurrencyKey maps to Source Currency.Source Currency Code
CustomerKey maps to Customer.Customer
DueDateKey maps to Delivery Date.Date
OrderDateKey maps to Date.Date
ProductKey maps to Product.Product
PromotionKey maps to Promotion.Promotion
SalesorderLineNumber maps to Internet Sales Order Details.Internet Sales Order.Key(1)
SalesOrderNumber maps to Internet Sales Order Details.Internet Sales Order.Key(0)
SalesTerritoryKey maps to Sales Territory.Sales Territory Region
ShipDateKey maps to Ship Date.Date
<ignore> maps to Customer Count

As you can see, partition columns are in the format dimension.attribute, which you can verify by referring to the appropriate entries on the Dimension Usage tab of the Adventure Works.cube Designer interface (accessible via the View Designer entry in the context sensitive menu of the Adventure Works.cube node in the Solution Explorer of Business Intelligence Development Studio.

The notation is slightly different in the case of Internet Sales Order Details.Internet Sales Order dimension attribute, since this is a composite key. (Here again, you have an option of examining its details by referring to the Dimension Structure tab of the Internet Sales Order Details.dim Designer interface). However, we are missing an input column corresponding to the distinct count measure based on the CustomerKey (since this column is already mapped to Customer.Customer dimension attribute). The column represents a distinct count (a non-additive measure) of individual customers for a particular partition, which is calculated by taking into account the number of unique CustomerKey values.

To remediate this, we will modify our original T-SQL statement by adding [dbo].[FactInternetSales].[CustomerKey] AS CustomerCount entry, which will allow us to reference its alias during column mapping. In addition, we will also add the ORDER BY [dbo].[FactInternetSales].[CustomerKey] clause to ensure that distinct count values are properly ordered when processing the target partition. This (by default enabled) behavior is controlled by the OLAP Process CheckDistinctRecordSortOrder advanced property of Analysis Server (accessible via its Properties dialog box, which can be displayed using the context sensitive menu of Analysis Server node in SQL Server Management Studio). Finally, we will also remove references to all non-mapped input columns. Effectively, our SQL command in OLE DB Source Editor will become:

SELECT    [dbo].[FactInternetSales].[CurrencyKey],
                    [dbo].[FactInternetSales].[CustomerKey] AS CustomerCount,
                    [dbo].[FactInternetSales].[CustomerKey],
                    [dbo].[FactInternetSales].[DueDateKey],
                    [dbo].[FactInternetSales].[OrderDateKey],
                    [dbo].[FactInternetSales].[ProductKey],
                    [dbo].[FactInternetSales].[PromotionKey],
                    [dbo].[FactInternetSales].[SalesOrderLineNumber],
                    [dbo].[FactInternetSales].[SalesOrderNumber],
                    [dbo].[FactInternetSales].[SalesTerritoryKey],
                    [dbo].[FactInternetSales].[ShipDateKey], 
FROM [dbo].[FactInternetSales]
WHERE OrderDateKey >= '20080101' AND OrderDateKey <= '20081231'
ORDER BY [dbo].[FactInternetSales].[CustomerKey]

Effectively, we now have ability to add the missing mapping for Customer Count in the Partition Processing Destination Editor.

The final (but optional) configuration step involves customizing the Advanced section, where you can set such parameters as Key error action (ConvertToUnknown or DiscardRecord), Processing error limit (Ignore errors or Stop on error after specific Number of errors), desired behavior following Specific error conditions (in the case of such issues as Key not found, Duplicate key, Null key converted to unknown, or Null key not allowed), as well as assign Error log path. This concludes our sample setup, yielding a fully functional SSIS package.

See all articles by Marcin Policht

Marcin Policht

Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

SQL Server 2008 and 2008 R2 Integration Services – Partition Processing Data Flow Destination

Marcin Policht

Company

Categories