If the name of the supplier changes over time, as you can see in the change in SupplierName below, the record will be updated. This looks pretty simple to implement, though it does not have history to keep track on.
To maintain SCD type 2, different people take different approaches. For example, one approach is to add effective and expiration dates to indicate a time period during which the record was active. If the expiration date is NULL it indicates the current active record. Another approach is to add one column to indicate the current active record. Normally people use the first approach or a combination of both.
There are multiple ways to implement Slowly Changing Dimension in SQL Server and the easiest of those is using Slowly Changing Dimension Transformation in data flow task of SSIS packages, although it has its own limitation as discussed at the end of this article.
Before I can start explaining about Slowly Changing Dimension Transformation let me first explain Surrogate Key and why it’s important in data warehouse.
We often add a meaningless key known as Surrogate Key in the dimension. The surrogate is usually implemented as an integer, acts as a unique key or primary key for the dimension and gets associated to the fact table using foreign key constraint in it.
Surrogate key becomes extremely important to manage slowly changing dimension.
Using Slowly Changing Dimension Transformation
Let’s first create a supplier table, which will have source data and add some data to it; as you can notice below, I have added the SupplierCode column, which is a primary key at source and will work as a business key.
Now let’s create a dimension for storing Supplier data from the source; as you can notice below I have added a SupplierId column as a surrogate key, effective date and expiration date for keeping track of historical changes. I have also added a CurrentFlag column to indicate the current active record:
So far so good, let’s now create an SSIS package; add a data flow task to it and a source component to retrieve data from the source table. Now add a Slowly Changing Dimension Transformation component after the source component and connect the source component to the Slowly Changing Dimension Transformation component. Now double click on the Slowly Changing Dimension Transformation component to edit it and the Slowly Changing Dimension Wizard will be launched as shown below:
Click on the Next button to move on to the next screen of the wizard; in the next screen you need to first select the destination dimension table and map the columns. Next you need to specify the source column from the source table, which makes the business key. In my case SupplierCode is the primary key at the source table and hence I have marked it here as the business key as you can see below:
Click on the Next button to move on to the next screen of the wizard; in the next screen you need to specify each column of the dimension and whether it’s going to be treated as SCD Type 0, Type 1 or Type 2.
In my case I have chosen the Address to be treated as SCD Type 1 and Name to be treated as SCD Type 2 as shown below:
Click on Next button to move on to the next screen of the wizard; in the next screen you need to specify (as we have at least one of the column of SCD Type 2) a start date column (effective date), end date column (expiration date) and a variable to set the date value as you can see below:
Click on the Next button to move on to the next screen of the wizard; in the next screen you can specify the inferred dimension members setting as shown below:
Click on the Next button to move on the next screen of the wizard and click on the Finish button to complete the wizard; here is what you will see in the data flow task:
The Slowly Changing Dimension wizard will add several tasks to manage slowly changing dimension based on your selection and configuration. In the above screen the “Changing Attribute Updates Output” path will update records for SCD Type 1 (basically overwrites). The “New Output” path will add a new entry to the dimension for maintaining historical records and the “Historical Attribute Inserts Output” path will update previous records to be outdated by updating the Expiration date column.
Now when you execute the package for the first time you will notice two records from the above created source table will be loaded in the dimension table as you can see below:
Now execute this query to verify data in the Supplier dimension table:
And this is what you will see after execution of the above query; it’s pretty much what we expected after first execution:
Now let’s go back to source and update some records. With the below given script, I am updating Supplier name for SupplierCode = ‘S0000001’.
Now execute the package again and this time you will notice one record (New) has been inserted and one record (old) has been updated or marked as outdated. This happens because the column for which the value has changed has been configured as SCD Type 2:
Now execute the query again and verify the data. As expected there are two records for SupplierCode = ‘S0000001’; the earlier one has been updated with an expiration date to indicate outdated record and another one as latest record with latest supplier name:
The Slowly Changing Dimension transformation has been designed for ease of use and for smaller dimensions only. As we saw above, the Slowly Changing Dimension Transformation component is available out of the box in SSIS and can be quickly configured for smaller dimensions. Below are some of the reasons why Slowly Changing Dimension transformation is not fit for all scenarios especially if your dimension is large:
In this article I talked about Slowly Changing Dimension Transformation, which is available out of the box in SSIS toolkit and can be used easily and quickly configured for managing smaller slowly changing dimension.
In the next article I am going to talk about some of the alternatives that you can use for managing larger slowly changing dimensions.