Introduction
In my last article I talked about Slowly Changing Dimension (SCD) Transformation, which is available out of box in the SSIS toolkit and can be used easily and configured quickly for managing smaller slowly changing dimension. The Slowly Changing Dimension Transformation is good if you want to get started easily and quickly but it has several limitations (I talked about these limitations in my last article, Managing Slowly Changing Dimension with Slow Changing Transformation in SSIS) and does not perform well when the number of rows or columns gets larger and larger.
In this article I am going to talk about some of the alternatives that you can take for managing larger slowly changing dimensions.
Getting Started with Managing Slowly Changing Dimension (SCD) with MERGE Statement
Let’s first create a supplier table, which will have source data and add some data to it; as you can notice below, I have added a SupplierCode column, which is a primary key at source and will work as a business key.
USE [AdventureWorks2012] GO CREATE TABLE [dbo].[Supplier]( [SupplierCode] CHAR(8) PRIMARY KEY, [SupplierName] [varchar](50) NULL, [Address] [varchar](50) NULL, ) ON [PRIMARY] GO INSERT INTO [dbo].[Supplier] ([SupplierCode], [SupplierName], [Address]) VALUES ('S0000001', 'ABC Company', 'USA'), ('S0000002', 'XYZ Corporation', 'USA') GO SELECT * FROM [dbo].[Supplier]
Now let’s create a dimension for storing Supplier data from the source. As you can see below I have added a SupplierId column as surrogate key (you can find more about Surrogate Key and its importance in my last article), effective date and expiration date for keeping track of historical changes. I have also added CurrentFlag column to indicate the current active record:
USE [AdventureWorks2012] GO CREATE TABLE [dbo].[DimSupplier]( [SupplierId] [int] IDENTITY(1,1) NOT NULL, [SupplierCode] CHAR(8), [SupplierName] [varchar](50) NULL, [Address] [varchar](50) NULL, [EffectiveDate] [date] NULL, [ExpirationDate] [date] NULL, [CurrentFlag] [char](1) NULL, CONSTRAINT [PK_DimSupplier] PRIMARY KEY CLUSTERED ([SupplierId] ASC) ) ON [PRIMARY] GO
Now to manage slowly changing dimension we can use the MERGE statement, which was introduced in SQL Server 2008. We need to write two MERGE statements to manage SCD type 1 and SCD type 2 separately.
Here is the MERGE statement to manage SCD Type 1 for the table we have created above and with an assumption that Address will be treated as SCD Type 1 changes:
USE [AdventureWorks2012] GO MERGE INTO [dbo].[DimSupplier] Dest USING [dbo].[Supplier] Src ON (Dest.[SupplierCode] = Src.[SupplierCode]) WHEN MATCHED AND (Dest.[Address] != Src.[Address]) THEN UPDATE SET Dest.[Address] = Src.[Address]; GO
Here is the MERGE statement to manage SCD Type 2 for the table we have created above and with an assumption that SupplierName will be treated as SCD Type 2 changes:
USE [AdventureWorks2012] -- This inserts another record to the dimension for SCD Type changes INSERT INTO [dbo].[DimSupplier] ([SupplierCode], [SupplierName], [Address], [EffectiveDate], [ExpirationDate], [CurrentFlag]) SELECT [SupplierCode], [SupplierName], [Address], [EffectiveDate], [ExpirationDate], [CurrentFlag] FROM ( MERGE [dbo].[DimSupplier] TT USING [dbo].[Supplier] ST ON (TT.[SupplierCode] = ST.[SupplierCode]) -- This inserts new records in the dimension table WHEN NOT MATCHED THEN INSERT ([SupplierCode], [SupplierName], [Address], [EffectiveDate], [ExpirationDate], [CurrentFlag]) VALUES ([SupplierCode], [SupplierName], [Address], '01/01/1900', NULL, 'Y') -- This marks the older record to be outdated for SCD Type 2 WHEN MATCHED AND TT.[CurrentFlag] = 'Y' AND (ISNULL(TT.[SupplierName], '') != ISNULL(ST.[SupplierName], '')) THEN UPDATE SET TT.[CurrentFlag] = 'N', TT.[ExpirationDate] = GETDATE() - 1 OUTPUT $Action Action_Taken, ST.[SupplierCode], ST.[SupplierName], ST.[Address], GETDATE() AS [EffectiveDate], NULL AS [ExpirationDate], 'Y' AS [CurrentFlag] ) AS MERGE_OUT WHERE MERGE_OUT.Action_Taken = 'UPDATE'; GO SELECT * FROM [dbo].[DimSupplier]
Now let’s go back to source and update some records. With the script below, you can see that I am updating SupplierName for SupplierCode = ‘S0000001’ and after that executing the above MERGE statement for tracking SCD Type 2 changes.
USE [AdventureWorks2012] GO UPDATE [dbo].[Supplier] SET [SupplierName] = 'ABC Company Ltd.' WHERE [SupplierCode] = 'S0000001' GO SELECT * FROM [dbo].[Supplier] GO
Now we execute the statement below; we can see a history of the record changes as shown below:
USE [AdventureWorks2012] GO SELECT * FROM [dbo].[DimSupplier]
Now if you are wondering why we need to write two MERGE statements to manage SCD Type 1 and SCD Type 2 changes separately and why we cannot have a single MERGE statement to manage them both together–the MERGE statement allows only one WHEN MATCHED clause in single MERGE statement; if you try to add more than one you will receive this exception:
Msg 10714, Level 15, State 1, Line 17
An action of type ‘WHEN MATCHED’ cannot appear more than once in a ‘UPDATE’ clause of a MERGE statement.
When using MERGE statement for managing slowly changing dimension you might encounter an exception as shown below, in scenarios where the primary key of your dimension is being referenced by foreign key constraints from other tables:
Msg 356, Level 16, State 1, Line 3
The target table ‘dbo.DimSupplier’ of the INSERT statement cannot be on either side of a (primary key, foreign key) relationship when the FROM clause contains a nested INSERT, UPDATE, DELETE, or MERGE statement. Found reference constraint ‘FK_FactTransaction_DimSupplier’.
There is already a connect item logged on Microsoft Connect site and as per the workaround provided, either we can drop foreign key constraint before data load and recreate it again after data load or use a temporary table solution as discussed here: http://connect.microsoft.com/SQLServer/feedback/details/435031/unable-to-perform-slowly-changing-dimension-updates-on-a-dimension-table-with-a-foreign-key
If you still want to use SSIS for managing slowly changing dimensions, but if your dimension is quite large in number of rows or columns you cannot use the inbuilt Slowly Changing Dimension Transformation component for performance reasons, then you can evaluate a custom component available on the Codeplex site: http://dimensionmergescd.codeplex.com/
Conclusion
In my last article I talked about Slowly Changing Dimension Transformation, which is available out of the box in SSIS toolkit and can be used easily and configured quickly for managing a smaller slowly changing dimension.
In this article I talked about using the MERGE statement to manage larger slowly changing dimensions both of SCD Type 1 and Type 2 changes.
Resources
Slowly Changing Dimension Transformation
Using MERGE in SQL Server to insert, update and delete at the same time