A Look at SQL Server 2008 Change Tracking


Before SQL Server 2008, you had to build a custom solution if you wanted to keep track of the changes to the data in your tables. SQL Server 2008 has a new offering called Change Tracking that keeps track of each DML event type and the keys of the row that was affected.

Introduction

Before SQL Server 2008, if you wanted to keep track of the
changes to the data in your tables, you had to build a custom solution. This
usually involved triggers on the tables being tracked, a table or tables to
write the keys and a timestamp to custom code to query the changes, and jobs to
cleanup all that extra data. Change Tracking in SQL Server 2008 makes all this
customization a thing of the past.

An Overview

Would you like to see what rows have changed in your tables
without examining every insert, update, and delete? SQL Server 2008 has a new
offering called Change Tracking that, when turned on for a table, will keep
track of each DML event type and the keys of the row that was affected. This
means that at anytime, you can query to find out which rows have had an insert,
update, and delete against them. You can retrieve the keys from Change
Tracking and get the latest data from that table. You can even have it tell
you which columns where involved in the change.

Note that if you need to know all the changes that have
happened to a row and not just the latest, SQL server 2008 has another new
feature called Change Data Tracking that can give you this information. You
can find more about Change Data Tracking at the MSDN article,
"Basics of Change Data Capture". If you aren’t
sure which one you need, there is a
comparison
of the two technologies available at MSDN.

Enabling Change Tracking

To enable Change Tracking on a table, you must first enable
it for the database. ALTER DATABASE is used to accomplish this as in the
following example. Note that AUTO_CLEANUP and CHANGE_RETENTION are optional
and only applicable when enabling change tracking. The former specifies
whether the change tracking data should be cleaned up and the latter specifies
the length of time the data is kept prior to cleanup as expressed in minutes,
hours, or days. The default is 2 days.

ALTER DATABASE ExampleDB
SET CHANGE_TRACKING = ON
(AUTO_CLEANUP = ON , CHANGE_RETENTION = 5 DAYS)

To disable change tracking, set it to OFF. All tables in
the database must have change tracking disabled prior to disabling it for the
database.

ALTER DATABASE ExampleDB
SET CHANGE_TRACKING = OFF

Similarly, to enable change tracking for a given table, you would
execute something similar to this.

ALTER TABLE Orders
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = ON)

The table must have a primary key defined to enable change
tracking. The primary key is stored with the change tracking information to
allow you to join back to the table to retrieve the current column values.
More on this in a bit.

The WITH (TRACK_COLUMNS_UPDATED = ON) is an option that
tells SQL Server to record information about which columns were altered during
the change. This could allow you to pull the modified columns only instead of
the entire row when working with the changed rows. Again, you won’t be able to
see the previous values of those columns, just which ones changed.

To disable change tracking on the table…you guessed it:

ALTER TABLE Orders
DISABLE CHANGE_TRACKING

Querying Change Tracking

Now that we have a table being tracked, let’s do something
useful with that information. Based on what we set up above, the database will
keep a log of all the modifications on our Orders table for 5 days. For
review, the following statements set up change tracking on the database and on
the Orders table we create here.

ALTER DATABASE ExampleDB
SET CHANGE_TRACKING = ON
(AUTO_CLEANUP = ON , CHANGE_RETENTION = 5 DAYS)
 
 
CREATE TABLE Orders (OrderID     BIGINT not null Primary key,
                     CustID      BIGINT not null,
                     OrderStatus VARCHAR(20),
                     OrderDTM    DATETIME,
                     UpdateDTM   DATETIME)
 
ALTER TABLE Orders
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = ON)

Let’s insert an order and see what we get from change
tracking.

INSERT INTO Orders
VALUES (1, 5, 'Received', GETDATE(), GETDATE())

The table function that we can use to see what changes were
tracked is CHANGETABLE. Here we pass it the table we want changes for and
NULL for the last version since we do not have one.

SELECT * 
FROM CHANGETABLE(CHANGES Orders, NULL) CT
 
 
SYS_CHANGE_VERSION   SYS_CHANGE_CREATION_VERSION SYS_CHANGE_OPERATION SYS_CHANGE_COLUMNS         SYS_CHANGE_CONTEXT OrderID
-------------------- --------------------------- -------------------- -------------------------- ------------------ -----------
1                    1                           I                    NULL                       NULL               1
 
(1 row(s) affected)

The results show us that the insert (SYS_CHANGE_OPERATION =
‘I’) was captured and the primary key associated with the change version is
OrderID 1.

Now let’s try an update to that row. We’ll change the order
status and the update date and time. This time we should be able to see a
value in SYS_CHANGE_COLUMNS.

UPDATE Orders
SET OrderStatus = 'Packed',
    UpdateDTM   = GETDATE()
WHERE ORDERID   = 1

Instead of NULL , this time we know that we can use 1 for
the last version to pass into the CHANGETABLE function.

SELECT * 
FROM CHANGETABLE(CHANGES Orders, 1) CT
 
SYS_CHANGE_VERSION   SYS_CHANGE_CREATION_VERSION SYS_CHANGE_OPERATION SYS_CHANGE_COLUMNS         SYS_CHANGE_CONTEXT OrderID
-------------------- --------------------------- -------------------- -------------------------- ------------------ -----------
2                    NULL                        U                    0x000000000300000005000000 NULL               1
 
(1 row(s) affected)

This shows change version 2 was an update operation. We can
see that something was indeed recorded in SYS_CHANGE_COLUMNS , but it doesn’t
look usable. There is another change tracking function that can help us make
sense of this value. CHANGE_TRACKING_IS_COLUMN_IN_MASK takes the column id of
the change tracked column you are interested in and the CHANGETABLE SYS_CHANGE_COLUMNS
column.

In the example below, CHANGE_TRACKING_IS_COLUMN_IN_MASK is
used to find all the Orders rows that have had an update since change version 1
where the columns involved in the change include the OrderStatus column.

SELECT O.OrderID, OrderStatus, UpdateDTM 
  FROM Orders O
  JOIN CHANGETABLE(CHANGES Orders, 1) CT
    ON O.OrderID = CT.OrderID
   AND CT.SYS_CHANGE_OPERATION = 'U'
 WHERE CHANGE_TRACKING_IS_COLUMN_IN_MASK( COLUMNPROPERTY( OBJECT_ID( 'Orders' ),'OrderStatus', 'COLUMNID' ), CT.SYS_CHANGE_COLUMNS ) = 1
 
OrderID              OrderStatus          UpdateDTM
-------------------- -------------------- -----------------------
1                    Packed               2010-04-05 18:47:31.387
 
(1 row(s) affected)

These are just some examples of how you can use change
tracking. There are more change tracking functions available and more
information that can be gleened from the CHANGETABLE.

Change Tracking and Surrogate Keys

One observation I feel compelled to share with you regarding
change tracking is using change tracking with surrogate keys. Let’s say you
have a centralized order database that is synchronized with your branch
stores. The customer table contains a surrogate key of CustomerID. In addition,
each branch assigns its branch number to the customer row when it’s responsible
for a new customer entry. Now let’s say the company manages customer data
across business units. Therefore, changes to customer data can occur from
other systems including a centralized customer service department. When
changes to a branch’s customer data occur, you want to synchronize those
changes down to the branch. The problem here is that change tracking records
only the primary key with the change information. So, if you want to limit the
rows you synchronize, you have to retrieve all the changes from change tracking
for the customer table (since your last synch version). You can then examine
each of those rows for the correct branch number by joining back to the Customer
table and limiting the results to those for the specific branch. The surrogate
key, in this case, caused us to have a much larger working table than would
otherwise be necessary if branch number had been a part of the primary key.
This is not meant to steer your table design in any particular direction, just
to provide food for thought when you may have a large set of changes that could
be segregated down to a subset needed for synchronization.

Conclusion

If you need a lightweight solution for discovering what
operations have occurred on your data rows, change tracking can save you
significant effort. This only scratches the surface of what you can do with
change tracking. SQL Server Books Online provides additional guidance on
getting the most out of this new addition to SQL Server.

For More Information

MSDN:
Change Tracking

MSDN: Change
Tracking Functions

»


See All Articles by Columnist

Deanna Dicken

Deanna Dicken
Deanna Dicken
Deanna Dicken is an architect with approximately 20 years of IT experience. In that time she has worked on many large-scale, mission-critical applications and has been involved at all phases of the lifecycle. She has also contributed to three SQL Server MCSE books, co-authored Learn SQL in a Weekend, and tech edited many other titles. She lives on the outskirts of Indianapolis, Indiana with her husband, Curtis, and children, Kylee and Zach.

Latest Articles