One of the primary objectives of SQL Server 2008 R2 Analysis Services is deriving meaningful information from relational data. This is accomplished by employing online analytical processing (OLAP) and data mining methods, which rely on manipulating multidimensional structures known as cubes. Cubes consist of so-called measures (the term that refers to most commonly numeric values, which quantify events that need to be analyzed) and dimensions (providing context for these events and corresponding to categories according to which data will be aggregated). Dimensions are described by their attributes (serving the role equivalent to that of a column in a normalized table). However, such attributes rarely remain static, which introduces the need for a structured approach that facilitates their updates. This is the purpose of Slowly Changing Dimension algorithms that, in turn, are incorporated into the identically named Data Flow transformation in SQL Server 2008 R2 Integration Services, featured in this article.
Before we focus on its characteristics, we need to first review the algorithms that govern principles of its operations. To begin with, it is important to point out that most of them rely on the presence of two keys in the target dimension table. The first one, referred to as natural, designates identity of individual dimension members (entities represented by rows in the table) and remains constant regardless of changes to non-key columns, while the other, known as surrogate, ensures uniqueness of each table row (satisfying requirements imposed by rules of relational databases). Effectively, it is possible to have multiple rows with matching natural keys but different surrogate keys (although ultimately specifics of this arrangement depend on the type of update mechanism, which we will review shortly). For example, if you consider Product.dim dimension based on the AdventureWorksDW2008R2 Data Warehouse database and examine one of its underlying tables dbo.DimProduct, you will notice that it contains ProductKey and ProductAlternateKey columns, where the first one serves as the IDENTITY and the other takes the form of a seven-character-string matching ProductNumber from Production.Product table in the AdventureWorks OLTP database. While the former changes with each new table entry, the latter remains the same throughout the product's lifetime, even in case one of its properties, such as, for example, Color, Dealer Price, SafetyStockLevel or DaysToManufacture needs to be modified.
Note that the manner in which such modifications are applied impacts the outcome of subsequent analysis. For example, if our intention is to evaluate whether a change in product's safety stock level has any business significance, we would want to have a way to identify when the new one has been introduced, while still retaining convenient access to historical records. On the other hand, in the absence of such requirement, the most efficient approach would involve simply replacing the original value. Both scenarios (along with their variances with differing mechanisms of updates and data retrieval) are accommodated by several distinct Slowly Changing Dimensions types:
- Type 1 - an update is applied directly to the original row in the dimension table, replacing an existing entry in a non-key column with its new value. While this means that historical data is lost (so you have no ability to reference it going forward), the implementation is very straightforward. If we were to follow our sample scenario describing changes to [dbo].[DimProduct] table in the AdventureWorks2008R2 database, this could correspond to replacing the value of 1000 in the SafetyStockLevel field of the Adjustable Race model with another number.
- Type 2 - an update results in the creation of a new row in the dimension table, with its natural key matching the one present in the original row, a new, unique surrogate key, and a new value assigned to the column representing the changed dimension attribute. This can also be fairly easily accomplished in the case of our sample table. For example, if we wanted to apply the same change but without losing any historical records, we would create a new row, with the same ProductAlternateKey (set to AR-5381), a new ProductKey (with its value assigned based on its IDENTITY property), and the new number appearing in the SafetyStockLevel column.
- Type 3 - an update is recorded by utilizing an additional (designated specifically for this purpose) column in the same row. This means, in essence, that the ability to preserve historical data relies on an appropriate table structure (unlike Type 2, where such dependency does not exist) and is restricted by the number of predefined columns. Since the Slowly Changing Dimension transformation does not support Type 3 changes, we will not be covering them here in more detail.
- Type 4 - a row being updated is deleted from its current location and created in a separate table dedicated to hosting historical data only. Such a table typically includes an additional field containing each record's creation date. We will also forgo more in-depth coverage of this type, due to lack of its support in the SSIS-based Slowly Changing Dimension transformation.
- Type 6 - constitutes a hybrid approach, resulting from the combination of types 1, 2, and 3. This is accomplished by providing supplementary data identifying status and historical context of each record by including additional columns (such as StartDate, EndDate, and Status, which are included in our sample table).
Considering that this is our introductory article into the subject of Slowly Changing Dimension from the perspective of SQL Server 2008 R2 Integration Services, we will start by presenting the first of these types, since its SSIS based implementation is most straightforward. To illustrate its characteristics, we will take advantage of the aforementioned sample table, which contains the listing of products offered by AdventureWorks - a company manufacturing and selling bicycles (with relevant databases downloadable from the CodePlex web site, where you will also find detailed procedure describing their deployment). The resulting Analysis Services installation includes Product.dim dimension, corresponding to [dbo].[DimProduct] table in the target AdventureWorksDW2008R2 Data Warehouse database, populated by entries from [Production].[Product] table in the AdventureWorks OLTP database.
To accomplish our objective, launch Business Intelligence Development Studio and create a new project based on the Integration Services template. Switch to the Data Flow tab and activate the hyperlink displayed in the center of the main window stating No Data Flow task have been added to this package. Click here to add a new Data Flow task. With the new Data Flow task in place and active, drag an OLE DB Source component from the Toolbox onto the Designer interface. Display its Editor window by selecting Edit... item from its context-sensitive menu. In the Connection Manager section, click on the New... command button and point to the target instance of the Adventure Works OLTP database. Once you have defined a new OLE DB connection manager, ensure that Table or view entry appears in the Data access mode listbox, and select [Production].[Product] table. Preview the input data to verify that your settings are correct and close the Editor window.
Next, add the Slowly Changing Dimensions Transformation to the Designer interface, position it directly under the OLE DB Source window, extending its Data Flow Path output until both components are connected. Double-click on the newly added transformation to trigger a wizard that guides you through all steps necessary to configure its settings (alternatively, you can use Edit... entry in its context sensitive menu). On the Select a Dimension Table and Keys page, use Connection manager and Table or view listboxes to point to the target dimension table ([dbo].[DimProduct] table in the target AdventureWorksDW2008R2 Data Warehouse database). The wizard will automatically map columns in the input Data Flow Path (originating from the previously created OLE DB Source) and the dimension table as long as their names and data types are the same. Since, in our case, this does not cover the natural key, you will need to manually match the ProductNumber (as the Input Column) and ProductAlternateKey (as the Dimension Column) and specify Business key as the corresponding Key Type. Similarly, you can adjust other, not automatically correlated Input Columns entries with their Dimension Columns counterparts (such as Name and EnglishProductName), but make sure that you leave the corresponding Not a key column field for each of them unchanged.
Once you complete the process of column matching and confirm your choices, you will be prompted to decide how changes to column data will be handled (on the Slowly Changing Dimensions Columns page). In general, this involves assigning one of three possible Change Types to at least one of non-key columns in the fact table or view:
- Fixed Attribute - prevents any modifications to a given attribute by treating them as errors.
- Changing Attribute - corresponds to Type 1 updates, resulting in replacing existing attributes with a new value included in the incoming data row.
- Historical Attribute - equivalent to Type 2 updates, where changes to designated non-key columns are saved in a new record, resulting in the existing one being marked as out of date.
Since our intention is to demonstrate Type 1 changes, we will use the second of these options (designated by the Change Type entry) and assign it to the SafetyStockLevel (identified via the Dimension Columns drop-down list). For the sake of simplicity, we will limit our selection to a single column, although obviously it is possible to add others.
Next, you are presented with the Fixed and Changing Attribute Options page, where you need to decide whether to Fail the transformation if changes are detected in a fixed attribute (the corresponding checkbox is grayed out since we did not specify this Change Type to be applied to a dimension column) as well as whether to Change all matching records, including outdated records, when changes are detected in a changing attribute (assuming that you picked at least one Changing or Historical attribute). For the time being, leave these options with their default settings, as well as disregard entries appearing on the Inferred Dimension Members page (we will explore them in more detail in one of our upcoming articles). The last page of the wizard displays the listing of the outputs that will be automatically configured by the transformation. If you selected only the Changing attribute option (as intended), the only one listed there will be New Record Output.
This process yields a fully configured Slowly Changing Dimension transformation with two outputs. The first of them, labeled New Output, handles inserts of any new records (those for which a match on the business key has not been found) into the dimension table accessible via an auto generated OLE DB Destinations. The second one, named Changing Attribute Updates Output, directs records with a matching business key but different changing attribute to the OLE DB Command, which is responsible for executing the T-SQL UPDATE statement, handling changes to the column you identified on the Slowly Changing Dimensions Columns page. You can easily test the functionality of the transformation by modifying a SafetyStockLevel in an arbitrary row in the Production.Product table and executing the package. With the default sample table content, this should result in a single update carried out via OLE DB Command, with the total of 504 rows processed.
See all articles by Marcin Policht