SQL Server 2008’s Change Data Capture – Tracking the Moving Parts

In this first CDC article, we will track the changes that
take place as SQL Server 2008’s Change Data Capture feature is started on a
user table. Next month, we’ll look at the performance of CDC.

My focus here is not just showing how CDC works on the
surface (there are many articles that cover that), but rather to see what we
can find out about the various moving parts within SQL Server that actually
make CDC function. Understanding the internals will help in an overall
understanding of CDC.

Introduction: SQL Server 2008 Change Data Capture

SQL Server 2008’s CDC functionality reads the transaction
log to record changes in system tables associated with each table for which CDC
is enabled. It writes those files to system tables in the same database, and
those system tables are accessible through direct queries or system functions.

Let’s walk through some example code and see what changes in each step.

Tracking the Parts Using Example Code

Initially, we start with a brand new database. To follow the
changes, open Object Explorer and expand these folders:

  • Databases
  • SQLServer Agent | Jobs
  • (Once the database is created) Security | Roles | Database Roles
  • (Once the database is created) Programmability | Stored
    Procedures | System Stored Procedures
  • (Once the database is created) Programmability | Functions |
    Table-valued Functions

Create Database


CREATE DATABASE TestCdc

This gives us the new database, but there are no new jobs.
Expand the Tables folder and the System Tables folder. Empty so far.

Enable Change Data Capture on Database TestCdc


USE TestCdc
— Turn on Change Data Capture at database level
EXEC sys.sp_cdc_enable_db

This step takes a few seconds. Now refresh the System Tables
folder. There are now six system tables:

Notice that five of these are part of the “cdc” schema.

There are also four new rather odd looking “SQL inline
table-valued function” entries in the sys.all_objects table:

  • fn_cdc_get_all_changes_◦…◦
  • fn_cdc_get_all_changes_
  • fn_cdc_get_net_changes_◦…◦
  • fn_cdc_get_net_changes_

(I used a non-proportional font with spaces marked with ‘’ to make it clear that
there are embedded and trailing spaces in these names.)

Here is the query that will return those functions:


SELECT [name] FROM sys.all_objects WHERE [name] LIKE ‘%…%’

There are multiple new stored procedures:

  • sp_batchinsert_lsn_time_mapping
  • sp_ins_dummy_lsn_time_mapping
  • sp_ins_instance_enabling_lsn_time_mapping
  • sp_ins_lsn_time_mapping

There are still no new jobs or roles.

Create a New Schema and User Table


CREATE SCHEMA MySchema
CREATE TABLE MySchema.Person (
PersonID int IDENTITY PRIMARY KEY,
FirstName varchar(32) NOT NULL,
LastName varchar(32) NOT NULL,
UpdateCt int NOT NULL DEFAULT 0
)

This creates a new user table, but the table is not yet
enabled for CDC.

This step does not affect the list of
functions/procedures/roles we’re tracking.

Enable New User Table for CDC


EXEC sys.sp_cdc_enable_table
@source_schema = N’MySchema’,
@source_name = N’Person’,
@role_name = N’ChangeDataAccessRole’,
@supports_net_changes = 1

This enables the new user table for CDC. Refresh the folders
listed above. We now have

  • one new system table (cdc.MySchema_Person_CT)
  • two new jobs (cdc.TestCdc_capture and cdc.TestCdc_cleanup)
  • one new database role (ChangeDataAccessRole)
  • two new table-valued functions
  • three new stored procedures:
    • sp_batchinsert_389576426
    • sp_insdel_389576426
    • sp_upd_389576426

You can actually get the text for the stored procedures (sp_helptext ‘cdc.sp_batchinsert_389576426’
[your number will likely be different]). Interesting reading. It is clear that
this stored procedure is generated and specific to this user table.

If you enable a second table for CDC, you will get a second set of sp_batchinsert_n,
sp_insdel_n, and sp_upd_n procedures.

System Table cdc.MySchema_Person_CT

This table has the following columns:

  • __$start_lsn
  • __$end_lsn
  • __$seqval
  • __$operation
  • __$update_mask
  • PersonID
  • FirstName
  • LastName
  • UpdateCt

Details regarding this table are covered well in other
articles including Books Online.

Job cdc.TestCdc_capture

The capture job is very interesting as you dig into it. From
sysjobs, we see that this job is owned by sa, is in the category “REPL-LogReader”
and has a description of “CDC Log Scan Job”.

From sysjobsteps, we see that it has two steps:

Step:

1

Name:

“Starting Change Data Capture Collection Agent”

Command:

RAISERROR(22801, 10, -1)

Raiserror Output:

“Starting the Change Data Capture Collection Agent job. To
report on the progress of the operation, query the sys.dm_cdc_log_scan_sessions
dynamic management view.”

Step:

2

Name:

“Change Data Capture Collection Agent”

Command:

sys.sp_MScdc_capture_job

There are two logical next steps after reading those
details, but we’ll leave the second until we’ve inserted some data.


sp_helptext N’sys.sp_MScdc_capture_job’
SELECT * FROM sys.dm_cdc_log_scan_sessions

Procedures sys.sp_MScdc_capture_job and sys.sp_cdc_scan

sys.sp_MScdc_capture_job is the procedure that kicks off the
whole CDC process. It first does some security checks and then calls sys.sp_cdc_scan,
which does the real work. sys.sp_cdc_scan “executes the
change data capture log scan operation
.”

By default, sys.sp_cdc_scan
loops continuously with a five-second delay between loops. Inside each normal
loop, it calls sp_replcmds.
(It is interesting to note that this is a replication procedure. Books Online
says that calling this procedure with the default (1) will return “the next
transaction waiting for distribution.” The default @maxtrans in this procedure
is 500.)

Last, it reads from sys.dm_cdc_log_scan_sessions
to prepare for a call to sp_sqlagent_log_jobhistory to record job history
information.

Database Role ChangeDataAccessRole

If you open the Properties window for the new ChangeDataAccessRole
role, you can see that the role is owned by “cdc” and has SELECT permissions on
the two table-valued functions listed above and nothing else.

Conclusion

Digging into these details gives us insight into how SQL
Server 2008’s CDC is implemented. So far, we haven’t even inserted a single
record into the new table. We’ll pick that up next time as well as show performance
comparison results.

Thoughts or comments? Drop a note in the forum.

Reference

Download the .sql files for this article.

»


See All Articles by Columnist
Rob Garrison

Rob Garrison
Rob Garrison
Rob Garrison is a database architect at Fiserv in Portland, Oregon specializing in OLTP design and development. He has over twenty years of IT experience, with over ten years in database development and architecture.

Latest Articles