Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted December 21, 2011

Physical Database Design Consideration

By Greg Larsen

There are lots of things to think about when you design a physical database.   What data types should I use?  What column is appropriate for the primary key? Are there particular indexes that I should use to improve performance? How should I implement data integrity rules?  This list goes on and on.  In this article I will be exploring different physical database design elements.  These design elements should help provide you some guidance to ensure you make the right design choice for your database design.   

Picking the Right Data Type

When it comes to deciding which data type you should select for a column in your database it's easy right?  Well this is not always the case.  Sometimes you have to decide whether or not to support fix length or a variable length column value.  Other times you might need an integer column value, but what is an appropriate size for that integer column.  You may also find that you need to support a different character set because you want to support those special accents characters above some letters, like the “e” in the name José, or a more exotic character set that support symbols like Chinese.  There are a number of design decisions you need to make when it comes to selecting the right data type for a database column.   In the next few sections I will be reviewing different data types and giving you some ideas to help you make good physical database design choices when it comes to picking the right data type for a particular column.

Variable Length Verses Fix Length Character Data

SQL Server supports both fixed length character data as well as variable length character data types.  Which one is right for your database column?  Well like most technical discussions “it depends”.

CHAR and NCHAR data types are fixed length data type.  They are ideal for supporting columns that store data where the column values in each row have the same length.  Most codes that you store might fall into this category.  Because CHAR and NCHAR columns are fixed length columns they will take up space regardless of whether or not the column has a value.  Because of this, it is better if you use the CHAR and NCHAR data types for columns that have very few NULLS.  If you expect your column to have a lot of NULLS then you should consider using a VARCHAR or NVARCHAR data type to save on disk space.   If the data you plan to store in a column is not exactly the same length but relatively close in length then a CHAR or NCHAR would be a good data type to consider.

A VARCHAR and NVARCHAR data types are variable length data type.  The physical storage of a VARCHAR and NVARCHAR data type stores both the value of the column and the length of the data in the column.   This allows SQL Server to use a minimum amount of disk space to support storing the column value.  The amount of space used is the size of the value being stored plus 2-bytes for the length.  Because of this VARCHAR and NVARCHAR columns are great for support columns that have a large variance in size of the column value.  Additionally VARCHAR and NVARCHAR columns are great for storing columns that have a high degree of NULL values, because no space is taken to store a NULL value.

Unicode Verses Non-Unicode

Since the world consists of many different languages and character sets we need a way to handle storing all those different characters in SQL Server.   SQL Server supports storing all of these different international character sets by supporting Unicode character types: NVARCHAR, NCHAR, and NTEXT (note NTEXT is deprecated and should probably not be used for new development). 

In order to be able to store all the different possible Unicode characters, a Unicode data type takes up twice the disk space of most non-Unicode data types.   Most non-Unicode data is stored using just a single byte.  Therefore storing your data using a Unicode data type can take up to twice the space.  Keep in mind there are some non-Unicode languages that do take 2 bytes of storage, in this case there is no storage difference in the size of storing non-Unicode verses Unicode data.  Refer to books online to identify the few non-Unicode data types that take two bytes to store a single character.

In today’s global economy you need to build applications that are robust regardless of where you live in the world and what language you speak.   Computers are getting faster and disk space is getting cheaper.  Therefore the thought of storing Unicode data verses non-Unicode data is becoming more blurred.  Kind of reminds me of the Y2K issues regarding saving some space by not storing the year.   We all remember what happened when we had to start storing those first two characters of the four-digit year.

The cost of changing application code to support Unicode down the road should you need it might be very expensive, so you want to make the right decision when picking a data type while you are designing your database.  If you think there is the slightest possibility you will need to store Unicode data in a table column, than make it use a Unicode data type.  For columns that you are 100% sure that you will never need to store Unicode data then making them use a non-Unicode data type would be ok.  If you are just not sure, then it would probably be best to use a Unicode data type to minimize the probability of conversion issues down the road.

Different Integer Data Types

Who said size doesn’t matter? It does when storing integer data.  Integers come in all sizes, just like the different sizes of the bottles shown in figure 1.

Size does matter
Photo Credit: http://www.flickr.com/photos/pianom4n/3887012103/

Figure 1: Size does matter

 

The number of different unique integer values you expect to store in a column should drive your decision making process when selecting an integer data type for that column.  The table in Figure 2 describes the size and value ranges supported by each of the different integer data types.

 

Data Type

Values Supported

Storage Size

Tinyint

Value between 0 – 255

 

1 byte

Smallint

Value between -32,768 through 32,767

 

2 bytes

integer

Value between -2,147,483,648 through 2,147,483,647

 

4 bytes

Bigint

Value between -9,223,372,036,854,775,808 through 9,223,372,036,854,775,807

 

8 bytes

Figure 2: Integer Data types size and supported values

 

If you don’t want to waste disk space, and use additional I/O bandwidth then you should make sure you pick an integer data type that matches the values you want to store.  For example, if you are going have a code column, that only has no more than 20 possible code values from 0-19, then you don’t want to define that code column as an integer, when the tinyint data type has the capability to store all 20 different values.

Just like the discussion in the Unicode section you should plan ahead.  If think that the number of possible unique integer values will possibly run close to the upper, or lower limit of a particular integer data type then you should probably pick the next larger integer data type.  Wasting a small amount of disk space for these types of columns is much more economical in the long run than having to retrofit your code when you do finally hit the hard upper or lower bounds of a particular data type.

Allowing NULLS, or Not

The questions will probably arise when you are designing your database whether or not you should allow columns to contain null values.   This is a highly controversial and debated topic.  I’m not going to rehash a lot of details here regarding the two different sides of this debate.

SQL Server has a number session and database options that control how NULLS should be handled by SQL Server, like ANSI_NULLS, CONCAT_NULL_YIELDS_NULL,   and ANSI_NULL_DEFAULT.   These options affect how SQL Server will deal with NULL values and how your code will be interpreted when processing NULLS.

Did you know that NULL comparisons can yield three different outcomes?  Yes it can and those values are True, False, or Unknown.   Because of all the different nuisances associated with columns that can contain NULL values, it can make your application logic more complicated if you allow NULL values in your database.   If you want to simplify your code and eliminate possible errors associated with supporting NULL values in the database then I would suggest you require all columns to be defined as NOT NULL. 

If you don’t allow null values in any of your database columns what are you to do when you have business rules that dictate that a column doesn’t have a value?  The answer is simple, have a default value for those columns.  A default value is a specific value that is used to populate a column when no value is provided when you create a new data row.  When you define a default value on a column the default value can be used to identify the column value is null.  

What Kind of Date Data Type Should I Use?

With the introduction of SQL Server 2008 there are six different date and time data types, five of which can support holding a date, and five that support holding the time, and four that hold both a date and time value.  Just like the different integer data types, each date data type uses different amounts of space.  Additionally for date data types that also support storing the time you need to consider how much precision you require for the time portion, as well as what are the specific data ranges do you need to support.  All date data types that support stored date and time are not created equal.  

Which date data type should you pick?  The answer to this question depends on what you plan to store. Are you just going to store a date?  Or do you need to also to store the time along with the date?  If you are storing the time, how precise does the time need to be?  Answers to these questions will help you determine which date/time data type will be best for your date column.

The table in Figure3 shows all the different date and time data types.  This table can be used to determine the different date and time ranges you can store for each date and time column.     

Data type

Format

Range

Accuracy

Storage size (bytes)

User-defined fractional second precision

Time zone offset

time

hh:mm:ss[.nnnnnnn]

00:00:00.0000000 through 23:59:59.9999999

100 nanoseconds

3 to 5

Yes

No

date

YYYY-MM-DD

0001-01-01 through 9999-12-31

1 day

3

No

No

smalldatetime

YYYY-MM-DD hh:mm:ss

1900-01-01 through 2079-06-06

1 minute

4

No

No

datetime

YYYY-MM-DD hh:mm:ss[.nnn]

1753-01-01 through 9999-12-31

0.00333 second

8

No

No

datetime2

YYYY-MM-DD hh:mm:ss[.nnnnnnn]

0001-01-01 00:00:00.0000000 through 9999-12-31 23:59:59.9999999

100 nanoseconds

6 to 8

Yes

No

datetimeoffset

YYYY-MM-DD hh:mm:ss[.nnnnnnn] [+|-]hh:mm

0001-01-01 00:00:00.0000000 through 9999-12-31 23:59:59.9999999 (in UTC)

       

Figure 3: Date and Time data types

If you only need to store a date that requires the time to be accurate to the nearest minute then it might be most appropriate to use a smalldatetime data type provided it supports a date range that covers the dates you need to store.   If you need to make sure your time on a given date has time accurate to within 100 nanoseconds then you would want to use the datetime2 column. 

With SQL Server 2008 the new time and datetime2 columns allow you to define different amounts of precision for the time portion can store.   The amount of accuracy you pick for the time portion will dictate the amount of storage space required to store a given date time value.   You can use these two data types to fine tune the amount of disk space your date/time columns will use.

Data Integrity Rules

When you talk to data owners you will find they have lots of rules about their data.  They will say things like:

  • This code can only contain the values 1 through 10
  • This value needs to be in one of the approved types that can be found in some table

When you meet with your data owners, you need to record all the different business rules they might have around their data.  Some of these rules can be built into your database design and are normally known as data integrity rules.  Building these rules into your database design will allow the SQL Server database engine to enforce these data integrity rules.  

Constraints, triggers and default values are things you can use to implement business rules right into your database design.  Using these objects will help maintain your database data integrity.  If you have a column that you know can only contain a certain range of values then make sure you create a check constraint on the column.  By doing this SQL Server will only allow you to enter the acceptable values into that column.   For column values that must be one of the possible values contained in another table make sure you create a foreign key constraint on this column.  When you have a foreign key constraint on a column it keeps you from storing values that are not contained within the other table.  When a row is uniquely identified by one or more columns you can build this rule into the database by creating a unique constraint or primary key.  Creating a unique constraint will help ensure each row is uniquely identified by the unique key columns.   If it is not then the SQL Server will not allow you to INSERT or UPDATE a record that would violate the unique key constraint.   Triggers are another option for implementing data integrity rules.  With triggers you can provide complicated business logic that can be used to validate your data when an INSERT, UPDATE, or DELETE event occurs.  Consider using these different ways to maintain the integrity of your data.

Indexing Considerations

The SQL Server database engine is always looking for short cuts to finding the data it need to solve a query.  Short cuts allow SQL Server the find the data it needs quickly.  Indexes on database tables allow SQL Server to quickly find the data it needs to quickly resolve the query.  Therefore if you want to optimize your queries you need to build indexes to provide short cuts to your data.  

Indexes provide short cuts to finding data

Photo Credit: Photo Credit: http://www.flickr.com/photos/biz/2933072825/

Figure 4: Indexes provide short cuts to finding data

The following simple guidelines can be used to help you decide which indexes to build, and the type of columns you should consider as index key columns:

  • Columns that are used on WHERE and JOIN operations are good candidates for index key columns.
  • Keys should be kept narrow to minimize the amount of disk space required to store the index key values.
  • Index key column should be highly unique values (very few rows associated with a given column value)
  • Clustered index keys should be unique; otherwise SQL Server adds additional bytes in the index structure to make them unique.
  • Appropriately set the FILLFACTER based on how you think updating will affect page splits in the index tree.

You also need to keep in mind that indexes are updated when UPDATE, INSERT and DELETE statements are issues.  Therefore the more indexes you have the longer it will take to perform these data manipulation statements.   Although there is no specific recommendation on the number of indexes that makes sense for every situation, typically data warehouse type databases do have more indexes then online transaction processing type databases.

Finding the Balance

When designing your physical database you need to find the balance between conserving disk space and supporting future data requirements.  No one wants to make major revisions to a physical database design because someone decided to increase the size of a field, or started storing people’s names with the special accent characters.  Take care to make sure you select the best data types, build the right data integrity rules and selecting the appropriate indexes to support your data requirements for both the short and long term.  

See all articles by Greg Larsen



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM