This is part 2 of a series. If you have not read part 1, you can find it here.
Our product owner has chosen to implement the accounts and users end-to-end for sprint 1. The following database-related tasks have been identified and estimated (and I have volunteered to take them):
Create Account table and CRUD procedures
Build automated unit tests for Account procedures
Create Cardholder table and CRUD procedures
Build automated unit tests for Cardholder procedures
In the CREATE DATABASE script, we create four filegroups plus the primary filegroup. Each filegroup, including the log, is split physically between two files.
This database is created from the very beginning with multiple filegroups and multiple files per filegroup (including primary).
Spreading the tables and indexes among multiple filegroups and physical files is a well-known performance enhancement. If this database were deployed to a real production environment, the database administrator would have the ability to place the filegroups and files on separate physical drives or separate LUNs.
Choosing the right number of filegroups and files is a balancing act. Choosing too many filegroups or too many files per filegroup makes administration and installation needlessly complex. Choosing too few will not give the database administrator options if they find performance issues in production.
In this case, I have chosen four filegroups outside of primary. No tables or indexes will be created on the primary filegroup.
Some people create specific filegroups for data versus indexes. I like to mix data and indexes on all [non-primary] filegroups. This is much more an art than a science, so it is unlikely you will find hard-and-fast rules.
For this sprint, there are only two tables, so these will be on two different filegroups, and the indexes will be spread across the remaining two filegroups.
Initially, there are indexes on specific columns that we know will be queried. More indexes may be added later.
Each table has a clustered primary key. Both of these tables have surrogate keys, and both use integer identity columns. When you use identity columns, consider carefully what size the datatype should be. I have used everything from tinyint to bigint. In certain applications, int is just not big enough. You do not want your application to fail years later because the identity has hit its limit. Moreover, the development, testing, and migration required to change the datatype of a primary key is far too great to risk.
There are a number of database options that are OFF by default but should be ON based on recommendations from the SQL Server 2000 Best Practices Analyzer. The CreateDatabase script sets all these options to ON:
I am very interested to find what people are using in production environments (either 2000 or 2005). Please provide feedback through the forum (or write me directly): Are you setting these options to ON? If not, why not? I have created a forum thread to discuss database options, so watch for updates there. What other options do you change, and why?
The Account and Cardholder tables both include a Status column. The simplest solution here would be to store a string (e.g. ‘Active’, ‘Inactive’, ‘Deleted’). To normalize these values, you might create an AccountStatus table and store a small value in Account that identifies the row in the AccountStatus table (e.g. 1, 2, or ‘A’, ‘I’, ‘D’).
However, for a column like Status where the set of values is well-defined, consider using an enumeration. This requires defining status strings and associated status numbers before the application is deployed. The application code will have to include an enumeration that matches that in the database. To make this Status column human-readable without requiring people to memorize the mapping, use a view that translates the status identifiers into their associated strings.
Application programmers who work all day in C#, C++, or Java are very comfortable with enumerations. The one requirement is to keep the application code enumeration exactly matching the database enumeration. If the set of possible values is not stable, then don’t use this method.
DateTime Columns and UTC
Another design piece that I would like feedback on is datetime columns. Are you storing datetime in local time or UTC? I have created a forum thread to discuss local versus UTC. What do you see as benefits and drawbacks? If you use UTC, do you “tag” the columns somehow so that it is obvious that the time is UTC and not local?
At least initially, I will be using UTC dates. I may reconsider this choice based on the feedback in the forum.
An Open Question: Storing the Primary Cardholder
One requirement of the system is the ability to identify the primary cardholder. This is the person whose name, address, and phone number will be associated with the account. There may be multiple Cardholder records associated with a single account, and we need to be able to choose one record as the primary. The question is this, how should this be modeled?
Option 1: Add a PrimaryCardholder bit column to Cardholder.
More than one Cardholder record could be marked as primary.
Updating the primary cardholder for an account requires transactionally updating two Cardholder records (turning one off and one on).
Option 2: Add a PrimaryCardholderID column to Account.
Requires transactionally adding an Account record, adding a Cardholder record, and updating the Account record.
Option 3: Add an associative table (call it PrimaryCardholder) that stores AccountID and CardholderID.
A unique index on AccountID would guarantee that only one record exists per account.
How can you guarantee that one PrimaryCardholder record does exist per Account record?
Slightly more-complex schema.
Requires transactionally adding an Account record, a Cardholder record, and a PrimaryCardholder record.
Please provide your feedback and recommendations through the forum. I will leave this requirement out of the code for now.
I have only included the “create” procedures for now. These procedures are pretty straightforward. Because we are supporting both SQL Server 2000 and 2005, there are no TRY/CATCH blocks. Later in the column, we will switch to 2005-only and will tackle the issue of TRY/CATCH then.
Hopefully, the @TestMode parameter will seem obvious. If not, we will go into details regarding automated unit testing next month.
What do you like or dislike about the stored procedures? I would appreciate hearing your feedback either through the forum or directly. I have written literally thousands of stored procedures but have never had the privilege of such a wide-spread code review.
For Next Time
Next month, we will polish up what is here plus add ‘R’, ‘U’, and ‘D’ procedures and automated stored procedure unit testing. As of this writing, I have not written the automated unit tests for these procedures, so it’s quite possible there are defects in them. The code is available for download, but it’s not ready to ship until the unit tests are written, passing, and integrated into the automation scripts. That’s part of what we will cover next month.
Another critical part that is not built into these scripts is security. That will wait for later, but we can’t put it off for long. Leaving security until you have time for it is a recipe for disaster.
Download the .sql files for this article.