Implementing SQL Server’s Change Data Capture

Have you ever wondered what records are being inserted, updated and deleted in one or more SQL Server tables? When SQL Server 2008 was released the Microsoft product group came out with a new feature called “Change Data Capture”. You can use change data capture to track the changes that occur to a table. In this article I will give you a primer of how to implement change data capture and how to review the captured information to produce an audit trail of the changes to a database table.

What is Change Data Capture?

Change data capture is a process that is implemented within a database and allows you to identify the SQL Server tables in which all changes will be tracked. Change Data Capture is only available in Enterprise, Developer, and Data Center additions of SQL Server 2008 or higher. By using change data capture you can track changes to a single table, or multiple tables, no more need to create a series of different triggers to support auditing changes.

When you implement change data capture each change to a table is tracked in separate tables, one for every table being tracked. These tables are stored directly in the database where change data capture is enabled. When you implement change data capture a process is automatically generated and scheduled to collect and manage the change data capture information. By default change data capture information is only kept for 3 days. If you want to keep data longer you will need to modify the change data capture configuration, more on this later on in this article.

You can read the change data tracking tables directly to see what has changed, or use built-in functions to read the data. Microsoft recommends that you use the built-in functions to read the change data capture tables. I will be covering reading the change data capture information in the “Reading the Change Data Capture Information” section.

Building a Sample Database to Demo Change Data Capture

In order to show you how to use change data capture I’m first going to build a database and a couple of tables. This database and the tables will be used in all my example code in this article. To create the database and tables run the code in Listing 1.

CREATE DATABASE CDC_DEMO;

GO

USE CDC_DEMO;

GO

CREATE TABLE dbo.Boat (

Id int,

Name varchar(50),

BoatType varchar(20),

BoatLength tinyint,

BoatBeam varchar(20));

CREATE TABLE dbo.BoatListing (

Id int,

BoatId int,

AskingPrice money,

LastUpdateUser varchar(25));

Listing 1: Create CDC_DEMO database

By reviewing my code you can see I created a database named CDC_DEMO and two tables named Boat and BoatListing.

Enabling Change Data Capture

In order to use change data capture you first need to enable this feature. Change data capture is enabled at the database level. The script in Listing 2 can be used to enable change data tracking on my CDC_DEMO database.

use CDC_DEMO;

GO

— enable CDC on the database

EXEC sys.sp_cdc_enable_db;

GO

Listing 2: Enable change data capture on a database

As you can see it doesn’t take much to enable change data capture on a database. All you need to do is execute the sp_cdc_enable_db system stored procedure. If you want to enable change data capture on more than one database you will need to execute the sp_cdc_enable_db system stored procedures in every database you want to capture data.

Turning on Change Data Capture for a Table

Just because I enabled change data capture doesn’t mean I am capturing changes to all tables in my CDC_DEMO database. Before I can start tracking changes, I need to enable change data capture on the tables I want to track. To enable change data tracking for my Boat table I will run the code in Listing 3.

USE CDC_DEMO;

GO

EXEC sys.sp_cdc_enable_table

@source_schema = N’dbo’,

@source_name = N’Boat’,

@role_name = NULL, — Role gating not enabled

@filegroup_name = N’PRIMARY’; — consider writing to a separate filegroup

GO

Listing 3: Enable change data capture on a table

If you review the code in Listing 3 you can see I executed the sp_cdc_enable_table system stored procedure to enable change data capture on my Boat table. In my script in Listing 1 I told SQL Server to capture all columns in the change data capture table, and to create that table on the PRIMARY file group.

This stored procedure accepts other parameters. One of those parameters worth mentioning is @captured_column_list. If you use this parameter as well when you call the sp_cdc_enable_table sp, you can identify the specific columns you want to capture. Please refer to Books online for a complete list of parameters that can be passed to the sp_cdc_enable_table sp. The code in Listing 4 uses the @captured_column_list parameter.

USE CDC_DEMO;

GO

EXEC sys.sp_cdc_enable_table

@source_schema = N’dbo’,

@source_name = N’BoatListing’,

@role_name = NULL, — Role gating not enabled

@filegroup_name = N’PRIMARY’, — should consider writing audit date to separate filegroup

@captured_column_list = N’Id,AskingPrice,LastUpdateUser’;

GO

Listing 4: Enabled change data capture to only capture specific set of columns

The only columns I am requesting to capture in Listing 4 are the Id, AskingPrice and LastUpdateUser columns of the BoatListing table.

After running the code in Listing 3 or 4 if I was to review my CDC_Demo database and the SQL Agent jobs, I would find some objects that were created as part of enabling change data capture. Those new objects are those used to manage, track and maintain the change data capture information. In my CDC_Demo database if I expand my System Tables view, the System Stored Procedures items, or the Table-valued Functions, I would find a number of objects that have a schema of “cdc”. The “cdc” schema table objects contain the actual change data capture information. The “cdc” stored procedures and functions are the code behind capturing and managing change data capture information. When I look at SQL Agent jobs I find a number of jobs that start with “CDC.CDC_DEMO…”. These SQL Agent jobs are those that run on my system to capture and maintain change data capture information.

Reading the Change Data Capture Information

Capturing data that has changed in my database is great. But once I capture it I might want to read it. In order to demo how to read change data capture information I first need to make some changes to my Boat and BoatListing tables so information will be captured in the change data capture table. I will make some changes to these tables by using the code in Listing 5.

USE CDC_DEMO;

GO

SET NOCOUNT ON;

INSERT INTO dbo.Boat VALUES(1

,’Wind Witch’

,’Sailboat’,23,’8 feet and 2 inches’

);

INSERT INTO dbo.Boat VALUES(1

,’Wind Witch II’

,’Sailboat’,25,’8 feet and 6 inches’

);

Listing 5: Updating my demo table

To read the change data capture information I could read the tables in the “cdc” schema directly, although Microsoft recommends against this. Instead I will use a system function that was created when I enable change data capture. To demo how this works I will display all my changes to my Boat table by running the code in Listing 6.

USE CDC_DEMO;

GO

DECLARE @from_lsn binary(10), @to_lsn binary(10)

SET @from_lsn = sys.fn_cdc_get_min_lsn(‘dbo_Boat’)

SET @to_lsn = sys.fn_cdc_get_max_lsn()

SELECT

CT.__$start_lsn,

CT.__$operation,

CASE CT.__$operation

WHEN 1 THEN ‘Delete’

WHEN 2 THEN ‘Insert’

WHEN 3 THEN ‘Update – Pre’

WHEN 4 THEN ‘Update – Post’

END AS Operation,

CT.*,

LSN.tran_begin_time,

LSN.tran_end_time,

LSN.tran_id

FROM

cdc.fn_cdc_get_all_changes_dbo_Boat

(@from_lsn, @to_lsn, N’all update old’) AS CT INNER JOIN

cdc.lsn_time_mapping AS LSN ON CT.__$start_lsn = LSN.start_lsn

GO

Listing 6: Display Change Data Capture Information For my Boat table

When I run this code two rows will be returned. One row for each record inserted. If you review this code you see I called a function named “cdc.fn_cdc_get_all_changes_dbo_Boat” . This function returns the change data capture information for my Boat table. In order to identify the change data capture records I want this function to return, I pass it a from and to log sequence number (LSN). These log sequence numbers identify the time frame for which I want to return change data capture information. In order to identify the appropriate LSNs I use the “sys.fn_cdc_get_min_lsn” and “sys.fn_cdc_get_max_lsn” functions. The “sys.fn_cdc_get_min_lsn” is passed the schema and table name of the table I’m tracking changes in the format “<schema name>_<table name>”, and returns the lowest LSN value contained in the change data tracking table. The “sys.fn_cdc_get_max_lsn” function returns the largest LSN values in the cdc.lsn_time_mapping table. These from and to LSN values are then used by “cdc.fn_cdc_get_all_changes_dbo_Boat” function to return INSERT, UPDATE and DELETE statements processed against my boat table during between those two LSN numbers.

My example code I’ve shown you does not create change data capture records. If your change data capture tables contains tons of changed records you might not want to display all changed records for a table. If you want to select your audit records based on a specific date, and time you could modify the code above to use the “cdc.fn_cdc_map_time_to_lsn” . This function is passed a date and time, plus a relative operator to identify a specific LSN value. For more information regarding how to use this function refer to Books Online.

SQL Server Agent Jobs that support Change Data Capture

As I mentioned earlier there are two different SQL Agent jobs that are set up when you enable change data capture. One job is responsible for capturing the changed data, while the other job cleans up or should I say purges old change data capture information.

The SQL Agent job that is used to capture the changed data is named “cdc.<database>_capture”, where “<database> is the name of the database that has change data capture enabled. If SQL Server Agent is not up, or this job is not enabled, changed data capture data will not find its way to the change data capture tables. By default this job is set up to start when SQL Server Agent starts and once started it runs continuously.

When you enable change data capture on a database you also get a cleanup SQL Server Agent job automatically set up. This job has the following naming convention: “cdc.<database>_cleanup”, where <database> is the name of the database that has change data capture enabled. This job by default runs at 2 AM and deletes captured data that is older than 3 days.

Viewing and Changing the Default Data Capture Configuration

You have no control over the default configuration settings used by the SQL Agent jobs that are created when you enable change data capture. But you can view and change the default settings for these jobs once they are created. To display the setting values for each change data capture job you use the sys.sp_cdc_help_jobs system stored procedure. If you want to change the settings of a change data capture job you use the sys.sp_cdc_change_job system stored procedure.

When I run the sys.sp_cdc_help_jobs on my CDC_DEMO database I get the output in Table 1.

job_id job_type job_name maxtrans maxscans continuous pollinginterval retention threshold

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — –

649DE88D-6289-49BF-9C11-79B9C5A67AE0 capture cdc.CDC_DEMO_capture 500 10 1 5 0 0

F16DA688-84EF-4BD5-AE55-A69E080F87E9 cleanup cdc.CDC_DEMO_cleanup 0 0 0 0 4320 5000

Table 1: My default settings for change data capture jobs.

If you review the output you can see the different default configuration settings for each of my different change data capture jobs. Table 2, which is from Books Online shows the definitions for each of these columns. You can change these column values using the sys.sp_cdc_change_job system stored procedure.

Column name

Data type

Description

job_id

uniqueidentifier

The ID of the job.

job_type

nvarchar(20)

The type of job.

maxtrans

int

The maximum number of transactions to process in each scan cycle.

maxtrans is valid only for capture jobs.

maxscans

int

The maximum number of scan cycles to execute in order to extract all rows from the log.

maxscans is valid only for capture jobs.

continuous

bit

A flag indicating whether the capture job is to run continuously (1), or run in one-time mode (0). For more information, see sys.sp_cdc_add_job (Transact-SQL).

continuous is valid only for capture jobs.

pollinginterval

bigint

The number of seconds between log scan cycles.

pollinginterval is valid only for capture jobs.

retention

bigint

The number of minutes that change rows are to be retained in change tables.

retention is valid only for cleanup jobs.

threshold

bigint

The maximum number of delete entries that can be deleted using a single statement on cleanup.

Table 2: List of Change Data Capture configuration values

To see how to change the default values for my change data capture cleanup job review the code in Listing 7.

USE CDC_DEMO;

GO

EXECUTE sys.sp_cdc_change_job @job_type = ‘cleanup’,

@retention = 86400;

Listing 7: Display Change Data Capture Information For my Boat table

In Listing 7 I changed the retention period of my change data capture information. I changed that retention period to 86400, which means the cleanup job will only delete change data capture information after 60 days. To find out more about how to use the sys.sp_cdc_change_job system stored procedure, please refer to Books Online.

Building an Audit Mechanism Using Change Data Capture

Change data capture is a feature that you can use to capture changes to any table in a database. It just requires that you have a version of SQL Server that supports change data capture, and that you enable the database and tables for which you want to capture information. If you have some need to capture information about UPDATE, INSERT, and DELETE statements you might want to consider whether or not change date capture will meet your needs.

One additional note that is worth adding. If you would like to capture the person that is submitting the UPDATE, INSERT, or DELETE statement then you will need to make sure your application that submits the UPDATE, INSERT, or DELETE statement populates a column in the table being tracked with the person’s credentials doing the updating. Change data capture only captures changed to the columns in a table. Therefore without the table containing a column that contains the person’s credentials there is no way to tell who updated the table.

See all articles by Greg Larsen

Gregory Larsen
Gregory Larsen
Gregory A. Larsen is a DBA at Washington State Department of Health (DOH). Greg is responsible for maintaining SQL Server and other database management software. Greg works with customers and developers to design and implement database changes, and solve database/application related problems. Greg builds homegrown solutions to simplify and streamline common database management tasks, such as capacity management.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles