SqlCredit - Part 11: Change Tracking Using History Records

November 26, 2007

This is part 11 of a series. If you have not read part 10, you can find it here.

Download files for part 11 of this series.

Change tracking is an important requirement of most OLTP systems. If the PrimaryCardHolder on an account was updated, you should be able to tell who made the change, when they made the change, and what the old and new values are.

We will discuss tracking these changes in three different types of history records. These are all what I would call “on-line” history records. The history records are stored in the same database as the base records. The application can make use of this historical information in whatever way is necessary without having to go to a different database for the historical data.

The examples all use the Account table and a corresponding history table.

The Standard Account Table

Here is the structure of the standard Account table for reference:

CREATE TABLE dbo.Account (
    AccountID            int          NOT NULL    IDENTITY,
    PrimaryCardHolderID  int          NOT NULL,
    Address1             varchar(40)  NOT NULL,
    Address2             varchar(40)  NOT NULL,
    City                 varchar(40)  NOT NULL,
    [State]              char(2)      NOT NULL,
    ZipCode              varchar(10)  NOT NULL,
    DateCreated          datetime     NOT NULL
        CONSTRAINT Account_DateCreated_DF
        DEFAULT GETUTCDATE(),
    DateLastUpdated      datetime     NOT NULL
        CONSTRAINT Account_DateLastUpdated_DF
        DEFAULT GETUTCDATE(),
    StatusCd             tinyint      NOT NULL
        CONSTRAINT Account_StatusCd_DF
        DEFAULT 1,

    CONSTRAINT Account_PK PRIMARY KEY CLUSTERED
        (AccountID)
);

“Diff” History Records

For this type of history record, any column in the base table that could be changed exists in the history table, and both DateCreated and DateLastUpdated are pulled out of the base table since this information will exist in the history table.

The nullability of columns matches that of the corresponding columns in the base table. In this case, all columns are NOT NULL.

Schema

CREATE TABLE dbo.AccountDiffHistory (
    AccountDiffHistoryID  int          NOT NULL    IDENTITY,
    AccountDiffID         int          NOT NULL,
    UpdateUserID          int          NOT NULL,
    HistoryDate           datetime     NOT NULL
        CONSTRAINT AccountDiffHistory_HistoryDate_DF
        DEFAULT GETUTCDATE(),
    PrimaryCardHolderID   int          NOT NULL,
    Address1              varchar(40)  NOT NULL,
    Address2              varchar(40)  NOT NULL,
    City                  varchar(40)  NOT NULL,
    [State]               char(2)      NOT NULL,
    ZipCode               varchar(10)  NOT NULL,
    StatusCd              tinyint      NOT NULL
        CONSTRAINT AccountDiffHistory_StatusCd_DF
        DEFAULT 1,

    CONSTRAINT AccountDiffHistory_PK PRIMARY KEY CLUSTERED
        (AccountDiffHistoryID),
    CONSTRAINT AccountDiffHistory_AccountDiffID_FK FOREIGN KEY
        (AccountDiffID) REFERENCES dbo.AccountDiff (AccountDiffID)
);

Added Columns

This type (and all history table examples here) includes an UpdateUserID column. This allows the application to store the ID of the user making the INSERT, UPDATE, or DELETE. This user may be an end user, an administrator, a customer service representative (CSR), or some other type of user that has appropriate permissions.

All these examples also include HistoryDate. This column replaces the DateCreated and DateLastUpdated columns in the base table. I named it HistoryDate to make it clear that this is the creation date of the history record and not the base table. In a single-table SELECT, this doesn’t matter much. In a JOIN between AccountDiff and AccountDiffHistory, it helps to clarify the role of the column.

Note that both of these columns are NOT NULL, and HistoryDate has a default to simplify INSERTs.

Merits and Issues

This is a very simple type of history record. This type makes it easy to browse (ad-hoc or programmatically) the values of all columns at any given point in history.

In the attached AccountDiff.sql script, an update scenario is played out. These are the steps:

  • The end user (ID 1000) creates the Account record through the web application.
  • A CSR (ID 99) locks the account using the back-office application.
  • A CSR (ID 99) re-enables the account using the back-office application.
  • A different user on the account (ID 1001) updates the address information.
  • The first user (ID 1000) makes the other user (ID 1001) the PrimaryCardholder for the account.
  • A CSR (ID 99) disables the account.
  • An administrative program (ID 1) marks the account deleted.

All of this information is captured in the history record. We know exactly who made what changes when.

The main drawback comes from the fact that it stores every column on every update. First, this requires more space. Second, it makes it harder to search the history records for changes.

The other side of that second “issue” is that, for any point in the record’s history, it is very simple to see the values of all columns. Depending on your application’s requirements, this may be more or less valuable.

Here is a limited set of columns from the history records for the scenario above, ordered by HistoryID:

	UpdateUserID PrimaryCardholderID StatusCd
	------------ ------------------- --------
	1000         1000                1
	99           1000                3
	99           1000                1
	1001         1000                1
	1000         1001                1
	99           1001                2
	1            1001                4

Compare this to the results from AccountMin to see how they differ.

When I have created history records in databases I designed, this is the scheme I have used.

“Min” History Records

This type of record is different from the “Diff” records in that the updateable columns are all nullable. Only the columns that are updated get written to the history record.

Schema

	CREATE TABLE dbo.AccountMinHistory (
	    AccountMinHistoryID  int          NOT NULL   IDENTITY,
	    AccountMinID         int          NOT NULL,
	    UpdateUserID         int          NOT NULL,
	    HistoryDate          datetime     NOT NULL
	        CONSTRAINT AccountMinHistory_HistoryDate_DF
	        DEFAULT GETUTCDATE(),
	    PrimaryCardHolderID  int          NULL,
	    Address1             varchar(40)  NULL,
	    Address2             varchar(40)  NULL,
	    City                 varchar(40)  NULL,
	    [State]              char(2)      NULL,
	    ZipCode              varchar(10)  NULL,
	    StatusCd             tinyint      NULL
	        CONSTRAINT AccountMinHistory_StatusCd_DF
	        DEFAULT 1,
	
	    CONSTRAINT AccountMinHistory_PK PRIMARY KEY CLUSTERED
	        (AccountMinHistoryID),
	    CONSTRAINT AccountMinHistory_AccountMinID_FK FOREIGN KEY
	        (AccountMinID) REFERENCES dbo.AccountMin (AccountMinID)
	);

Merits and Issues

This is also a very simple type of history record. This type makes it easy to see the changes and ignore the columns that have not changed.

The attached AccountMin.sql script runs the same scenario as above. Even though this scheme stores much less data, all of the changes are still captured. Again, we know exactly who made what changes when.

The benefit here, over the Diff scheme, is that it requires less space. The drawback here is basically the opposite of the first scheme. It is easy to see what changes happened during a particular update, but it is more difficult to recreate a complete record corresponding to each change. As before, depending on your application’s requirements, this may be more or less valuable.

Here is a limited set of columns from the history records for the scenario above, ordered by HistoryID:

	UpdateUserID PrimaryCardholderID StatusCd
	------------ ------------------- --------
	1000         1000                1
	99           NULL                3
	99           NULL                1
	1001         NULL                NULL
	1000         1001                NULL
	99           NULL                2
	1            NULL                4

The NULLs make it easy to see that, for instance, the second record has an update to StatusCd but not PrimaryCardholderID.

“Link” History Records

This type of record is one that I have seen multiple times and, frankly, I think it’s pretty awful.

The basic idea here is that you store a set of updateable columns, not in the base table at all, but in the history table only. This makes it more “pure” from a data redundancy standpoint, but the coding is much more complex. I have included my attempt at making it work in the attached AccountLink.sql script. If you are a big fan of this history scheme, please use the forum to explain the beauty that I’m not seeing.

Schema

I have to include the AccountLink schema here because it is so different from the others.

	CREATE TABLE dbo.AccountLink (
	    AccountLinkID     int  NOT NULL  IDENTITY,
	    CurrentHistoryID  int  NOT NULL,
	
	    CONSTRAINT AccountLink_PK PRIMARY KEY CLUSTERED
	        (AccountLinkID)
	);
	
	CREATE TABLE dbo.AccountLinkHistory (
	    AccountLinkHistoryID  int          NOT NULL    IDENTITY,
	    AccountLinkID         int          NOT NULL,
	    UpdateUserID          int          NOT NULL,
	    HistoryDate           datetime     NOT NULL
	        CONSTRAINT AccountLinkHistory_HistoryDate_DF
	        DEFAULT GETUTCDATE(),
	    PrimaryCardHolderID   int          NOT NULL,
	    Address1              varchar(40)  NOT NULL,
	    Address2              varchar(40)  NOT NULL,
	    City                  varchar(40)  NOT NULL,
	    [State]               char(2)      NOT NULL,
	    ZipCode               varchar(10)  NOT NULL,
	    StatusCd              tinyint      NOT NULL
	        CONSTRAINT AccountLinkHistory_StatusCd_DF
	        DEFAULT 1,
	
	    CONSTRAINT AccountLinkHistory_PK PRIMARY KEY CLUSTERED
	        (AccountLinkHistoryID)
	);

Merits and Issues

The main benefit here is that there is no duplicated data. Well, sort of. Since you don’t store any data in the base table, every history record must store every value. You might be able to implement a combination of “Link” and “Min” (where you store only the changed values), but the coding would be nasty, and the performance would be abysmal.

The issues with Link come when you start coding the schema and the procedures. Creating the double links (foreign keys) requires a lot of special coding. Any time you select an account and need to return anything besides the ID (which you are probably using to select it), you have to join to the history record. With either of the other two schemes, most selects can be accomplished with just the base table. With the Link scheme, every select requires a join.

Recommendations

If you’ve gotten this far, you know that I don’t like the Link scheme. Between the other two, I think the choice comes down to application requirements. If you are tracking these changes solely for auditing purposes or ad-hoc research, use the Min scheme. If your application will actually need to show the full record as of a certain point in time or a certain change, then use the Diff scheme.

What do you do? Please provide feedback through the forum. Do you use a different scheme than one of these? If so, please explain how it works and what business problems it addresses.

Regardless of which scheme you use, be sure it can answer the three critical questions:

  • Who made the change?
  • When did they make the change?
  • And what are the old and new values?

Download files for part 11 of this series.

» See All Articles by Columnist Rob Garrison

SqlCredit - Developing a Complete SQL Server OLTP Database Project








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers