Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted May 28, 2013

Managing Slowly Changing Dimension with MERGE Statement in SQL Server

By Arshad Ali

Introduction

In my last article I talked about Slowly Changing Dimension (SCD) Transformation, which is available out of box in the SSIS toolkit and can be used easily and configured quickly for managing smaller slowly changing dimension. The Slowly Changing Dimension Transformation is good if you want to get started easily and quickly but it has several limitations (I talked about these limitations in my last article, Managing Slowly Changing Dimension with Slow Changing Transformation in SSIS) and does not perform well when the number of rows or columns gets larger and larger.

In this article I am going to talk about some of the alternatives that you can take for managing larger slowly changing dimensions.

Getting Started with Managing Slowly Changing Dimension (SCD) with MERGE Statement

Let’s first create a supplier table, which will have source data and add some data to it; as you can notice below, I have added a SupplierCode column, which is a primary key at source and will work as a business key.

USE [AdventureWorks2012]
GO
CREATE TABLE [dbo].[Supplier](
       [SupplierCode] CHAR(8) PRIMARY KEY,
       [SupplierName] [varchar](50) NULL,
       [Address] [varchar](50) NULL,
) ON [PRIMARY]
GO
INSERT INTO [dbo].[Supplier] ([SupplierCode], [SupplierName], [Address])
VALUES
('S0000001', 'ABC Company', 'USA'),
('S0000002', 'XYZ Corporation', 'USA')
GO
SELECT * FROM [dbo].[Supplier]
 

Now let’s create a dimension for storing Supplier data from the source. As you can see below I have added a SupplierId column as surrogate key (you can find more about Surrogate Key and its importance in my last article), effective date and expiration date for keeping track of historical changes. I have also added CurrentFlag column to indicate the current active record:

USE [AdventureWorks2012]
GO
CREATE TABLE [dbo].[DimSupplier](
       [SupplierId] [int] IDENTITY(1,1) NOT NULL,
       [SupplierCode] CHAR(8),
       [SupplierName] [varchar](50) NULL,
       [Address] [varchar](50) NULL,
       [EffectiveDate] [date] NULL,
       [ExpirationDate] [date] NULL,
       [CurrentFlag] [char](1) NULL,
       CONSTRAINT [PK_DimSupplier] PRIMARY KEY CLUSTERED ([SupplierId] ASC)
) ON [PRIMARY]
GO

Now to manage slowly changing dimension we can use the MERGE statement, which was introduced in SQL Server 2008. We need to write two MERGE statements to manage SCD type 1 and SCD type 2 separately.

Here is the MERGE statement to manage SCD Type 1 for the table we have created above and with an assumption that Address will be treated as SCD Type 1 changes:

USE [AdventureWorks2012]
GO
MERGE INTO [dbo].[DimSupplier] Dest
USING [dbo].[Supplier] Src
       ON (Dest.[SupplierCode] = Src.[SupplierCode])
WHEN MATCHED AND (Dest.[Address] != Src.[Address])
THEN UPDATE
    SET Dest.[Address] = Src.[Address];
GO

Here is the MERGE statement to manage SCD Type 2 for the table we have created above and with an assumption that SupplierName will be treated as SCD Type 2 changes:

USE [AdventureWorks2012]
-- This inserts another record to the dimension for SCD Type changes
INSERT INTO [dbo].[DimSupplier] ([SupplierCode], [SupplierName], [Address], [EffectiveDate], [ExpirationDate], [CurrentFlag])
SELECT [SupplierCode], [SupplierName], [Address], [EffectiveDate], [ExpirationDate], [CurrentFlag]
FROM
(
       MERGE [dbo].[DimSupplier] TT
       USING [dbo].[Supplier] ST
              ON (TT.[SupplierCode] = ST.[SupplierCode])
       -- This inserts new records in the dimension table
       WHEN NOT MATCHED THEN
              INSERT ([SupplierCode], [SupplierName], [Address], [EffectiveDate],   [ExpirationDate], [CurrentFlag])
              VALUES ([SupplierCode], [SupplierName], [Address], '01/01/1900', NULL, 'Y')
       -- This marks the older record to be outdated for SCD Type 2
       WHEN MATCHED AND TT.[CurrentFlag] = 'Y' AND (ISNULL(TT.[SupplierName], '') != ISNULL(ST.[SupplierName], '')) THEN
              UPDATE SET TT.[CurrentFlag] = 'N', TT.[ExpirationDate] = GETDATE() - 1
              OUTPUT $Action Action_Taken, ST.[SupplierCode], ST.[SupplierName], 
              ST.[Address], GETDATE() AS [EffectiveDate], NULL AS [ExpirationDate], 'Y' AS [CurrentFlag]
)
AS MERGE_OUT
WHERE MERGE_OUT.Action_Taken = 'UPDATE';
GO
SELECT * FROM [dbo].[DimSupplier]

Results

Now let’s go back to source and update some records. With the script below, you can see that I am updating SupplierName for SupplierCode = ‘S0000001’ and after that executing the above MERGE statement for tracking SCD Type 2 changes.

USE [AdventureWorks2012]
GO
UPDATE [dbo].[Supplier]
SET [SupplierName] = 'ABC Company Ltd.'
WHERE [SupplierCode] = 'S0000001'
GO
SELECT * FROM [dbo].[Supplier]
GO

Now we execute the statement below; we can see a history of the record changes as shown below:

USE [AdventureWorks2012]
GO
SELECT * FROM [dbo].[DimSupplier]

Results

Now if you are wondering why we need to write two MERGE statements to manage SCD Type 1 and SCD Type 2 changes separately and why we cannot have a single MERGE statement to manage them both together--the MERGE statement allows only one WHEN MATCHED clause in single MERGE statement; if you try to add more than one you will receive this exception:

Msg 10714, Level 15, State 1, Line 17

An action of type 'WHEN MATCHED' cannot appear more than once in a 'UPDATE' clause of a MERGE statement.

When using MERGE statement for managing slowly changing dimension you might encounter an exception as shown below, in scenarios where the primary key of your dimension is being referenced by foreign key constraints from other tables:

Msg 356, Level 16, State 1, Line 3

The target table 'dbo.DimSupplier' of the INSERT statement cannot be on either side of a (primary key, foreign key) relationship when the FROM clause contains a nested INSERT, UPDATE, DELETE, or MERGE statement. Found reference constraint 'FK_FactTransaction_DimSupplier'.

There is already a connect item logged on Microsoft Connect site and as per the workaround provided, either we can drop foreign key constraint before data load and recreate it again after data load or use a temporary table solution as discussed here: http://connect.microsoft.com/SQLServer/feedback/details/435031/unable-to-perform-slowly-changing-dimension-updates-on-a-dimension-table-with-a-foreign-key

If you still want to use SSIS for managing slowly changing dimensions, but if your dimension is quite large in number of rows or columns you cannot use the inbuilt Slowly Changing Dimension Transformation component for performance reasons, then you can evaluate a custom component available on the Codeplex site: http://dimensionmergescd.codeplex.com/

Conclusion

In my last article I talked about Slowly Changing Dimension Transformation, which is available out of the box in SSIS toolkit and can be used easily and configured quickly for managing a smaller slowly changing dimension.

In this article I talked about using the MERGE statement to manage larger slowly changing dimensions both of SCD Type 1 and Type 2 changes.

Resources

Slowly Changing Dimension Transformation

Slowly Changing Dimension

Using MERGE in SQL Server to insert, update and delete at the same time

See all articles by Arshad Ali



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM


















Thanks for your registration, follow us on our social networks to keep up-to-date