Figure 2: Integer Data types size and supported values
If you don’t want to waste disk space, and use additional I/O bandwidth then you should make sure you pick an integer data type that matches the values you want to store. For example, if you are going have a code column, that only has no more than 20 possible code values from 0-19, then you don’t want to define that code column as an integer, when the tinyint data type has the capability to store all 20 different values.
Just like the discussion in the Unicode section you should plan ahead. If think that the number of possible unique integer values will possibly run close to the upper, or lower limit of a particular integer data type then you should probably pick the next larger integer data type. Wasting a small amount of disk space for these types of columns is much more economical in the long run than having to retrofit your code when you do finally hit the hard upper or lower bounds of a particular data type.
Allowing NULLS, or Not
The questions will probably arise when you are designing your database whether or not you should allow columns to contain null values. This is a highly controversial and debated topic. I’m not going to rehash a lot of details here regarding the two different sides of this debate.
SQL Server has a number session and database options that control how NULLS should be handled by SQL Server, like ANSI_NULLS, CONCAT_NULL_YIELDS_NULL, and ANSI_NULL_DEFAULT. These options affect how SQL Server will deal with NULL values and how your code will be interpreted when processing NULLS.
Did you know that NULL comparisons can yield three different outcomes? Yes it can and those values are True, False, or Unknown. Because of all the different nuisances associated with columns that can contain NULL values, it can make your application logic more complicated if you allow NULL values in your database. If you want to simplify your code and eliminate possible errors associated with supporting NULL values in the database then I would suggest you require all columns to be defined as NOT NULL.
If you don’t allow null values in any of your database columns what are you to do when you have business rules that dictate that a column doesn’t have a value? The answer is simple, have a default value for those columns. A default value is a specific value that is used to populate a column when no value is provided when you create a new data row. When you define a default value on a column the default value can be used to identify the column value is null.
What Kind of Date Data Type Should I Use?
With the introduction of SQL Server 2008 there are six different date and time data types, five of which can support holding a date, and five that support holding the time, and four that hold both a date and time value. Just like the different integer data types, each date data type uses different amounts of space. Additionally for date data types that also support storing the time you need to consider how much precision you require for the time portion, as well as what are the specific data ranges do you need to support. All date data types that support stored date and time are not created equal.
Which date data type should you pick? The answer to this question depends on what you plan to store. Are you just going to store a date? Or do you need to also to store the time along with the date? If you are storing the time, how precise does the time need to be? Answers to these questions will help you determine which date/time data type will be best for your date column.
The table in Figure3 shows all the different date and time data types. This table can be used to determine the different date and time ranges you can store for each date and time column.
Figure 3: Date and Time data types
If you only need to store a date that requires the time to be accurate to the nearest minute then it might be most appropriate to use a smalldatetime data type provided it supports a date range that covers the dates you need to store. If you need to make sure your time on a given date has time accurate to within 100 nanoseconds then you would want to use the datetime2 column.
With SQL Server 2008 the new time and datetime2 columns allow you to define different amounts of precision for the time portion can store. The amount of accuracy you pick for the time portion will dictate the amount of storage space required to store a given date time value. You can use these two data types to fine tune the amount of disk space your date/time columns will use.
Data Integrity Rules
When you talk to data owners you will find they have lots of rules about their data. They will say things like:
When you meet with your data owners, you need to record all the different business rules they might have around their data. Some of these rules can be built into your database design and are normally known as data integrity rules. Building these rules into your database design will allow the SQL Server database engine to enforce these data integrity rules.
Constraints, triggers and default values are things you can use to implement business rules right into your database design. Using these objects will help maintain your database data integrity. If you have a column that you know can only contain a certain range of values then make sure you create a check constraint on the column. By doing this SQL Server will only allow you to enter the acceptable values into that column. For column values that must be one of the possible values contained in another table make sure you create a foreign key constraint on this column. When you have a foreign key constraint on a column it keeps you from storing values that are not contained within the other table. When a row is uniquely identified by one or more columns you can build this rule into the database by creating a unique constraint or primary key. Creating a unique constraint will help ensure each row is uniquely identified by the unique key columns. If it is not then the SQL Server will not allow you to INSERT or UPDATE a record that would violate the unique key constraint. Triggers are another option for implementing data integrity rules. With triggers you can provide complicated business logic that can be used to validate your data when an INSERT, UPDATE, or DELETE event occurs. Consider using these different ways to maintain the integrity of your data.
The SQL Server database engine is always looking for short cuts to finding the data it need to solve a query. Short cuts allow SQL Server the find the data it needs quickly. Indexes on database tables allow SQL Server to quickly find the data it needs to quickly resolve the query. Therefore if you want to optimize your queries you need to build indexes to provide short cuts to your data.
Photo Credit: Photo Credit: http://www.flickr.com/photos/biz/2933072825/
Figure 4: Indexes provide short cuts to finding data
The following simple guidelines can be used to help you decide which indexes to build, and the type of columns you should consider as index key columns:
You also need to keep in mind that indexes are updated when UPDATE, INSERT and DELETE statements are issues. Therefore the more indexes you have the longer it will take to perform these data manipulation statements. Although there is no specific recommendation on the number of indexes that makes sense for every situation, typically data warehouse type databases do have more indexes then online transaction processing type databases.
Finding the Balance
When designing your physical database you need to find the balance between conserving disk space and supporting future data requirements. No one wants to make major revisions to a physical database design because someone decided to increase the size of a field, or started storing people’s names with the special accent characters. Take care to make sure you select the best data types, build the right data integrity rules and selecting the appropriate indexes to support your data requirements for both the short and long term.