Managing Slowly Changing Dimension with MERGE Statement in SQL Server

Introduction

In my last article I talked about Slowly Changing Dimension (SCD) Transformation, which is available out of box in the SSIS toolkit and can be used easily and configured quickly for managing smaller slowly changing dimension. The Slowly Changing Dimension Transformation is good if you want to get started easily and quickly but it has several limitations (I talked about these limitations in my last article, Managing Slowly Changing Dimension with Slow Changing Transformation in SSIS) and does not perform well when the number of rows or columns gets larger and larger.

In this article I am going to talk about some of the alternatives that you can take for managing larger slowly changing dimensions.

Getting Started with Managing Slowly Changing Dimension (SCD) with MERGE Statement

Let’s first create a supplier table, which will have source data and add some data to it; as you can notice below, I have added a SupplierCode column, which is a primary key at source and will work as a business key.

USE [AdventureWorks2012]
GO
CREATE TABLE [dbo].[Supplier](
       [SupplierCode] CHAR(8) PRIMARY KEY,
       [SupplierName] [varchar](50) NULL,
       [Address] [varchar](50) NULL,
) ON [PRIMARY]
GO
INSERT INTO [dbo].[Supplier] ([SupplierCode], [SupplierName], [Address])
VALUES
('S0000001', 'ABC Company', 'USA'),
('S0000002', 'XYZ Corporation', 'USA')
GO
SELECT * FROM [dbo].[Supplier]
 

Now let’s create a dimension for storing Supplier data from the source. As you can see below I have added a SupplierId column as surrogate key (you can find more about Surrogate Key and its importance in my last article), effective date and expiration date for keeping track of historical changes. I have also added CurrentFlag column to indicate the current active record:

USE [AdventureWorks2012]
GO
CREATE TABLE [dbo].[DimSupplier](
       [SupplierId] [int] IDENTITY(1,1) NOT NULL,
       [SupplierCode] CHAR(8),
       [SupplierName] [varchar](50) NULL,
       [Address] [varchar](50) NULL,
       [EffectiveDate] [date] NULL,
       [ExpirationDate] [date] NULL,
       [CurrentFlag] [char](1) NULL,
       CONSTRAINT [PK_DimSupplier] PRIMARY KEY CLUSTERED ([SupplierId] ASC)
) ON [PRIMARY]
GO

Now to manage slowly changing dimension we can use the MERGE statement, which was introduced in SQL Server 2008. We need to write two MERGE statements to manage SCD type 1 and SCD type 2 separately.

Here is the MERGE statement to manage SCD Type 1 for the table we have created above and with an assumption that Address will be treated as SCD Type 1 changes:

USE [AdventureWorks2012]
GO
MERGE INTO [dbo].[DimSupplier] Dest
USING [dbo].[Supplier] Src
       ON (Dest.[SupplierCode] = Src.[SupplierCode])
WHEN MATCHED AND (Dest.[Address] != Src.[Address])
THEN UPDATE
    SET Dest.[Address] = Src.[Address];
GO

Here is the MERGE statement to manage SCD Type 2 for the table we have created above and with an assumption that SupplierName will be treated as SCD Type 2 changes:

USE [AdventureWorks2012]
-- This inserts another record to the dimension for SCD Type changes
INSERT INTO [dbo].[DimSupplier] ([SupplierCode], [SupplierName], [Address], [EffectiveDate], [ExpirationDate], [CurrentFlag])
SELECT [SupplierCode], [SupplierName], [Address], [EffectiveDate], [ExpirationDate], [CurrentFlag]
FROM
(
       MERGE [dbo].[DimSupplier] TT
       USING [dbo].[Supplier] ST
              ON (TT.[SupplierCode] = ST.[SupplierCode])
       -- This inserts new records in the dimension table
       WHEN NOT MATCHED THEN
              INSERT ([SupplierCode], [SupplierName], [Address], [EffectiveDate],   [ExpirationDate], [CurrentFlag])
              VALUES ([SupplierCode], [SupplierName], [Address], '01/01/1900', NULL, 'Y')
       -- This marks the older record to be outdated for SCD Type 2
       WHEN MATCHED AND TT.[CurrentFlag] = 'Y' AND (ISNULL(TT.[SupplierName], '') != ISNULL(ST.[SupplierName], '')) THEN
              UPDATE SET TT.[CurrentFlag] = 'N', TT.[ExpirationDate] = GETDATE() - 1
              OUTPUT $Action Action_Taken, ST.[SupplierCode], ST.[SupplierName], 
              ST.[Address], GETDATE() AS [EffectiveDate], NULL AS [ExpirationDate], 'Y' AS [CurrentFlag]
)
AS MERGE_OUT
WHERE MERGE_OUT.Action_Taken = 'UPDATE';
GO
SELECT * FROM [dbo].[DimSupplier]

Results

Now let’s go back to source and update some records. With the script below, you can see that I am updating SupplierName for SupplierCode = ‘S0000001’ and after that executing the above MERGE statement for tracking SCD Type 2 changes.

USE [AdventureWorks2012]
GO
UPDATE [dbo].[Supplier]
SET [SupplierName] = 'ABC Company Ltd.'
WHERE [SupplierCode] = 'S0000001'
GO
SELECT * FROM [dbo].[Supplier]
GO

Now we execute the statement below; we can see a history of the record changes as shown below:

USE [AdventureWorks2012]
GO
SELECT * FROM [dbo].[DimSupplier]

Results

Now if you are wondering why we need to write two MERGE statements to manage SCD Type 1 and SCD Type 2 changes separately and why we cannot have a single MERGE statement to manage them both together–the MERGE statement allows only one WHEN MATCHED clause in single MERGE statement; if you try to add more than one you will receive this exception:

Msg 10714, Level 15, State 1, Line 17

An action of type ‘WHEN MATCHED’ cannot appear more than once in a ‘UPDATE’ clause of a MERGE statement.

When using MERGE statement for managing slowly changing dimension you might encounter an exception as shown below, in scenarios where the primary key of your dimension is being referenced by foreign key constraints from other tables:

Msg 356, Level 16, State 1, Line 3

The target table ‘dbo.DimSupplier’ of the INSERT statement cannot be on either side of a (primary key, foreign key) relationship when the FROM clause contains a nested INSERT, UPDATE, DELETE, or MERGE statement. Found reference constraint ‘FK_FactTransaction_DimSupplier’.

There is already a connect item logged on Microsoft Connect site and as per the workaround provided, either we can drop foreign key constraint before data load and recreate it again after data load or use a temporary table solution as discussed here: http://connect.microsoft.com/SQLServer/feedback/details/435031/unable-to-perform-slowly-changing-dimension-updates-on-a-dimension-table-with-a-foreign-key

If you still want to use SSIS for managing slowly changing dimensions, but if your dimension is quite large in number of rows or columns you cannot use the inbuilt Slowly Changing Dimension Transformation component for performance reasons, then you can evaluate a custom component available on the Codeplex site: http://dimensionmergescd.codeplex.com/

Conclusion

In my last article I talked about Slowly Changing Dimension Transformation, which is available out of the box in SSIS toolkit and can be used easily and configured quickly for managing a smaller slowly changing dimension.

In this article I talked about using the MERGE statement to manage larger slowly changing dimensions both of SCD Type 1 and Type 2 changes.

Resources

Slowly Changing Dimension Transformation

Slowly Changing Dimension

Using MERGE in SQL Server to insert, update and delete at the same time

See all articles by Arshad Ali

Arshad Ali
Arshad Ali
Arshad Ali works with Microsoft India R&D Pvt Ltd. He has 8+ years of experience, mostly on Microsoft Technologies. Most recently, as a SQL Developer and BI Developer he has been working on a Data Warehousing project. Arshad is an MCSD, MCITP: Business Intelligence, MCITP: Database Developer 2008 and MCITP: Database Administrator 2008 certified and has presented at several technical events including SQL-School. On an educational front, he has an MCA (Master in Computer Applications) and MBA-IT. Disclaimer : I work for Microsoft and help people and businesses make better use of technology to realize their full potential. The opinions mentioned herein are solely mine and do not reflect those of my current employer or previous employers.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles