SqlCredit - Part 11: Change Tracking Using History RecordsNovember 26, 2007 This is part 11 of a series. If you have not read part 10, you can find it here. Download files for part 11 of this series. Change tracking is an important requirement of most OLTP systems. If the PrimaryCardHolder on an account was updated, you should be able to tell who made the change, when they made the change, and what the old and new values are. We will discuss tracking these changes in three different types of history records. These are all what I would call on-line history records. The history records are stored in the same database as the base records. The application can make use of this historical information in whatever way is necessary without having to go to a different database for the historical data. The examples all use the Account table and a corresponding history table. The Standard Account TableHere is the structure of the standard Account table for reference:
CREATE TABLE dbo.Account (
AccountID int NOT NULL IDENTITY,
PrimaryCardHolderID int NOT NULL,
Address1 varchar(40) NOT NULL,
Address2 varchar(40) NOT NULL,
City varchar(40) NOT NULL,
[State] char(2) NOT NULL,
ZipCode varchar(10) NOT NULL,
DateCreated datetime NOT NULL
CONSTRAINT Account_DateCreated_DF
DEFAULT GETUTCDATE(),
DateLastUpdated datetime NOT NULL
CONSTRAINT Account_DateLastUpdated_DF
DEFAULT GETUTCDATE(),
StatusCd tinyint NOT NULL
CONSTRAINT Account_StatusCd_DF
DEFAULT 1,
CONSTRAINT Account_PK PRIMARY KEY CLUSTERED
(AccountID)
);
Diff History RecordsFor this type of history record, any column in the base table that could be changed exists in the history table, and both DateCreated and DateLastUpdated are pulled out of the base table since this information will exist in the history table. The nullability of columns matches that of the corresponding columns in the base table. In this case, all columns are NOT NULL. Schema
CREATE TABLE dbo.AccountDiffHistory (
AccountDiffHistoryID int NOT NULL IDENTITY,
AccountDiffID int NOT NULL,
UpdateUserID int NOT NULL,
HistoryDate datetime NOT NULL
CONSTRAINT AccountDiffHistory_HistoryDate_DF
DEFAULT GETUTCDATE(),
PrimaryCardHolderID int NOT NULL,
Address1 varchar(40) NOT NULL,
Address2 varchar(40) NOT NULL,
City varchar(40) NOT NULL,
[State] char(2) NOT NULL,
ZipCode varchar(10) NOT NULL,
StatusCd tinyint NOT NULL
CONSTRAINT AccountDiffHistory_StatusCd_DF
DEFAULT 1,
CONSTRAINT AccountDiffHistory_PK PRIMARY KEY CLUSTERED
(AccountDiffHistoryID),
CONSTRAINT AccountDiffHistory_AccountDiffID_FK FOREIGN KEY
(AccountDiffID) REFERENCES dbo.AccountDiff (AccountDiffID)
);
Added ColumnsThis type (and all history table examples here) includes an UpdateUserID column. This allows the application to store the ID of the user making the INSERT, UPDATE, or DELETE. This user may be an end user, an administrator, a customer service representative (CSR), or some other type of user that has appropriate permissions. All these examples also include HistoryDate. This column replaces the DateCreated and DateLastUpdated columns in the base table. I named it HistoryDate to make it clear that this is the creation date of the history record and not the base table. In a single-table SELECT, this doesnt matter much. In a JOIN between AccountDiff and AccountDiffHistory, it helps to clarify the role of the column. Note that both of these columns are NOT NULL, and HistoryDate has a default to simplify INSERTs. Merits and IssuesThis is a very simple type of history record. This type makes it easy to browse (ad-hoc or programmatically) the values of all columns at any given point in history. In the attached AccountDiff.sql script, an update scenario is played out. These are the steps:
All of this information is captured in the history record. We know exactly who made what changes when. The main drawback comes from the fact that it stores every column on every update. First, this requires more space. Second, it makes it harder to search the history records for changes. The other side of that second issue is that, for any point in the records history, it is very simple to see the values of all columns. Depending on your applications requirements, this may be more or less valuable. Here is a limited set of columns from the history records for the scenario above, ordered by HistoryID: UpdateUserID PrimaryCardholderID StatusCd ------------ ------------------- -------- 1000 1000 1 99 1000 3 99 1000 1 1001 1000 1 1000 1001 1 99 1001 2 1 1001 4 Compare this to the results from AccountMin to see how they differ. When I have created history records in databases I designed, this is the scheme I have used. Min History RecordsThis type of record is different from the Diff records in that the updateable columns are all nullable. Only the columns that are updated get written to the history record. SchemaCREATE TABLE dbo.AccountMinHistory ( AccountMinHistoryID int NOT NULL IDENTITY, AccountMinID int NOT NULL, UpdateUserID int NOT NULL, HistoryDate datetime NOT NULL CONSTRAINT AccountMinHistory_HistoryDate_DF DEFAULT GETUTCDATE(), PrimaryCardHolderID int NULL, Address1 varchar(40) NULL, Address2 varchar(40) NULL, City varchar(40) NULL, [State] char(2) NULL, ZipCode varchar(10) NULL, StatusCd tinyint NULL CONSTRAINT AccountMinHistory_StatusCd_DF DEFAULT 1, CONSTRAINT AccountMinHistory_PK PRIMARY KEY CLUSTERED (AccountMinHistoryID), CONSTRAINT AccountMinHistory_AccountMinID_FK FOREIGN KEY (AccountMinID) REFERENCES dbo.AccountMin (AccountMinID) );
Merits and IssuesThis is also a very simple type of history record. This type makes it easy to see the changes and ignore the columns that have not changed. The attached AccountMin.sql script runs the same scenario as above. Even though this scheme stores much less data, all of the changes are still captured. Again, we know exactly who made what changes when. The benefit here, over the Diff scheme, is that it requires less space. The drawback here is basically the opposite of the first scheme. It is easy to see what changes happened during a particular update, but it is more difficult to recreate a complete record corresponding to each change. As before, depending on your applications requirements, this may be more or less valuable. Here is a limited set of columns from the history records for the scenario above, ordered by HistoryID: UpdateUserID PrimaryCardholderID StatusCd ------------ ------------------- -------- 1000 1000 1 99 NULL 3 99 NULL 1 1001 NULL NULL 1000 1001 NULL 99 NULL 2 1 NULL 4 The NULLs make it easy to see that, for instance, the second record has an update to StatusCd but not PrimaryCardholderID. Link History RecordsThis type of record is one that I have seen multiple times and, frankly, I think its pretty awful. The basic idea here is that you store a set of updateable columns, not in the base table at all, but in the history table only. This makes it more pure from a data redundancy standpoint, but the coding is much more complex. I have included my attempt at making it work in the attached AccountLink.sql script. If you are a big fan of this history scheme, please use the forum to explain the beauty that Im not seeing. SchemaI have to include the AccountLink schema here because it is so different from the others. CREATE TABLE dbo.AccountLink ( AccountLinkID int NOT NULL IDENTITY, CurrentHistoryID int NOT NULL, CONSTRAINT AccountLink_PK PRIMARY KEY CLUSTERED (AccountLinkID) ); CREATE TABLE dbo.AccountLinkHistory ( AccountLinkHistoryID int NOT NULL IDENTITY, AccountLinkID int NOT NULL, UpdateUserID int NOT NULL, HistoryDate datetime NOT NULL CONSTRAINT AccountLinkHistory_HistoryDate_DF DEFAULT GETUTCDATE(), PrimaryCardHolderID int NOT NULL, Address1 varchar(40) NOT NULL, Address2 varchar(40) NOT NULL, City varchar(40) NOT NULL, [State] char(2) NOT NULL, ZipCode varchar(10) NOT NULL, StatusCd tinyint NOT NULL CONSTRAINT AccountLinkHistory_StatusCd_DF DEFAULT 1, CONSTRAINT AccountLinkHistory_PK PRIMARY KEY CLUSTERED (AccountLinkHistoryID) );
Merits and IssuesThe main benefit here is that there is no duplicated data. Well, sort of. Since you dont store any data in the base table, every history record must store every value. You might be able to implement a combination of Link and Min (where you store only the changed values), but the coding would be nasty, and the performance would be abysmal. The issues with Link come when you start coding the schema and the procedures. Creating the double links (foreign keys) requires a lot of special coding. Any time you select an account and need to return anything besides the ID (which you are probably using to select it), you have to join to the history record. With either of the other two schemes, most selects can be accomplished with just the base table. With the Link scheme, every select requires a join. RecommendationsIf youve gotten this far, you know that I dont like the Link scheme. Between the other two, I think the choice comes down to application requirements. If you are tracking these changes solely for auditing purposes or ad-hoc research, use the Min scheme. If your application will actually need to show the full record as of a certain point in time or a certain change, then use the Diff scheme. What do you do? Please provide feedback through the forum. Do you use a different scheme than one of these? If so, please explain how it works and what business problems it addresses. Regardless of which scheme you use, be sure it can answer the three critical questions:
Download files for part 11 of this series. » See All Articles by Columnist Rob Garrison SqlCredit - Developing a Complete SQL Server OLTP Database Project
Performance Testing SQL 2008's Transparent Data Encryption
SQL Server 2008's Change Data Capture - Tracking the Moving Parts Performance Testing - SQL Server 2008 versus SQL Server 2005 Exploring SQL Server's Index INCLUDEs Exploring SQL 2005's Ranking Functions - NTILE() and ROW_NUMBER() SqlCredit - Part 19: Exploring SQL 2005's Ranking Functions - RANK() and DENSE_RANK() SqlCredit, Part 18: Exploring the Performance of SQL 2005's OUTPUT Clause SqlCredit - Part 17: Exploring SQL 2005's OUTPUT Clause SqlCredit - Part 16: The Cost of Bloat SqlCredit - Part 15: The Cost of Distribution SqlCredit - Part 14: The Cost of Translation SqlCredit - Part 13: More on Indexed Persisted Computed Columns SqlCredit - Part 12: Exploring Indexed Persisted Computed Columns SqlCredit - Part 11: Change Tracking Using History Records SqlCredit - Part 10: MAC Performance and Updating SqlCredit SqlCredit - Part 9: Message Authentication Codes SqlCredit Part 8: Comparing Encrypt/DecryptByCert and Encrypt/DecryptByKey SqlCredit Part 7: Performance Impact of EncryptByCert and DecryptByCert SqlCredit Part 6: Exploring EncryptByCert and DecryptByCert SqlCredit - Part 5: Adding Card, Vendor, and Purchase, Plus Much Refactoring SqlCredit - Part 4: Schema and Procedure Security SqlCredit - Part 3: Schema/Proc Updates and Automated Unit Testing SqlCredit - Part 2: Creating the Database, Tables, CRUD Procedures SqlCredit - Developing a Complete SQL Server OLTP Database Project |