In this first CDC article, we will track the changes that
take place as SQL Server 2008’s Change Data Capture feature is started on a
user table. Next month, we’ll look at the performance of CDC.
My focus here is not just showing how CDC works on the
surface (there are many articles that cover that), but rather to see what we
can find out about the various moving parts within SQL Server that actually
make CDC function. Understanding the internals will help in an overall
understanding of CDC.
Introduction: SQL Server 2008 Change Data Capture
SQL Server 2008’s CDC functionality reads the transaction
log to record changes in system tables associated with each table for which CDC
is enabled. It writes those files to system tables in the same database, and
those system tables are accessible through direct queries or system functions.
Let’s walk through some example code and see what changes in each step.
Tracking the Parts Using Example Code
Initially, we start with a brand new database. To follow the
changes, open Object Explorer and expand these folders:
- Databases
- SQLServer Agent | Jobs
- (Once the database is created) Security | Roles | Database Roles
-
(Once the database is created) Programmability | Stored
Procedures | System Stored Procedures -
(Once the database is created) Programmability | Functions |
Table-valued Functions
Create Database
CREATE DATABASE TestCdc
This gives us the new database, but there are no new jobs.
Expand the Tables folder and the System Tables folder. Empty so far.
Enable Change Data Capture on Database TestCdc
USE TestCdc
— Turn on Change Data Capture at database level
EXEC sys.sp_cdc_enable_db
This step takes a few seconds. Now refresh the System Tables
folder. There are now six system tables:
- cdc.captured_columns
- cdc.change_tables
- cdc.ddl_history
- cdc.index_columns
- cdc.lsn_time_mapping
- dbo.systranschemas
Notice that five of these are part of the “cdc” schema.
There are also four new rather odd looking “SQL inline
table-valued function” entries in the sys.all_objects table:
- fn_cdc_get_all_changes_◦…◦
- fn_cdc_get_all_changes_…
- fn_cdc_get_net_changes_◦…◦
- fn_cdc_get_net_changes_…
(I used a non-proportional font with spaces marked with ‘◦’ to make it clear that
there are embedded and trailing spaces in these names.)
Here is the query that will return those functions:
SELECT [name] FROM sys.all_objects WHERE [name] LIKE ‘%…%’
There are multiple new stored procedures:
- sp_batchinsert_lsn_time_mapping
- sp_ins_dummy_lsn_time_mapping
- sp_ins_instance_enabling_lsn_time_mapping
- sp_ins_lsn_time_mapping
There are still no new jobs or roles.
Create a New Schema and User Table
CREATE SCHEMA MySchema
CREATE TABLE MySchema.Person (
PersonID int IDENTITY PRIMARY KEY,
FirstName varchar(32) NOT NULL,
LastName varchar(32) NOT NULL,
UpdateCt int NOT NULL DEFAULT 0
)
This creates a new user table, but the table is not yet
enabled for CDC.
This step does not affect the list of
functions/procedures/roles we’re tracking.
Enable New User Table for CDC
EXEC sys.sp_cdc_enable_table
@source_schema = N’MySchema’,
@source_name = N’Person’,
@role_name = N’ChangeDataAccessRole’,
@supports_net_changes = 1
This enables the new user table for CDC. Refresh the folders
listed above. We now have
- one new system table (cdc.MySchema_Person_CT)
- two new jobs (cdc.TestCdc_capture and cdc.TestCdc_cleanup)
- one new database role (ChangeDataAccessRole)
- two new table-valued functions
-
fn_cdc_get_all_changes_MySchema_Person (Books Online
reference) -
fn_cdc_get_net_changes_MySchema_Person (Books Online
reference) - three new stored procedures:
- sp_batchinsert_389576426
- sp_insdel_389576426
- sp_upd_389576426
You can actually get the text for the stored procedures (sp_helptext ‘cdc.sp_batchinsert_389576426’
[your number will likely be different]). Interesting reading. It is clear that
this stored procedure is generated and specific to this user table.
If you enable a second table for CDC, you will get a second set of sp_batchinsert_n,
sp_insdel_n, and sp_upd_n procedures.
System Table cdc.MySchema_Person_CT
This table has the following columns:
- __$start_lsn
- __$end_lsn
- __$seqval
- __$operation
- __$update_mask
- PersonID
- FirstName
- LastName
- UpdateCt
Details regarding this table are covered well in other
articles including Books Online.
Job cdc.TestCdc_capture
The capture job is very interesting as you dig into it. From
sysjobs, we see that this job is owned by sa, is in the category “REPL-LogReader”
and has a description of “CDC Log Scan Job”.
From sysjobsteps, we see that it has two steps:
Step: |
1 |
Name: |
“Starting Change Data Capture Collection Agent” |
Command: |
RAISERROR(22801, 10, -1) |
Raiserror Output: |
“Starting the Change Data Capture Collection Agent job. To |
Step: |
2 |
Name: |
“Change Data Capture Collection Agent” |
Command: |
sys.sp_MScdc_capture_job |
There are two logical next steps after reading those
details, but we’ll leave the second until we’ve inserted some data.
sp_helptext N’sys.sp_MScdc_capture_job’
SELECT * FROM sys.dm_cdc_log_scan_sessions
Procedures sys.sp_MScdc_capture_job and sys.sp_cdc_scan
sys.sp_MScdc_capture_job is the procedure that kicks off the
whole CDC process. It first does some security checks and then calls sys.sp_cdc_scan,
which does the real work. sys.sp_cdc_scan “executes the
change data capture log scan operation.”
By default, sys.sp_cdc_scan
loops continuously with a five-second delay between loops. Inside each normal
loop, it calls sp_replcmds.
(It is interesting to note that this is a replication procedure. Books Online
says that calling this procedure with the default (1) will return “the next
transaction waiting for distribution.” The default @maxtrans in this procedure
is 500.)
Last, it reads from sys.dm_cdc_log_scan_sessions
to prepare for a call to sp_sqlagent_log_jobhistory to record job history
information.
Database Role ChangeDataAccessRole
If you open the Properties window for the new ChangeDataAccessRole
role, you can see that the role is owned by “cdc” and has SELECT permissions on
the two table-valued functions listed above and nothing else.
Conclusion
Digging into these details gives us insight into how SQL
Server 2008’s CDC is implemented. So far, we haven’t even inserted a single
record into the new table. We’ll pick that up next time as well as show performance
comparison results.
Thoughts or comments? Drop a note in the forum.
Reference
-
Microsoft
SQL Server 2008 – Change Data Capture – Part I from Database Journal -
Microsoft
SQL Server 2008 – Change Data Capture – Part 2 from Database Journal -
Microsoft
SQL Server 2008 – Change Data Capture – Part 3 from Database Journal -
Microsoft
SQL Server 2008 – Change Data Capture – Part 4 from Database Journal -
Overview
of Change Data Capture from SQL Server Books Online -
Change
Data Capture from SQL Server Books Online -
Tuning
the Performance of Change Data Capture in SQL Server 2008 from Microsoft
TechNet
Download the .sql files for this article.