SQL Server 2000 Administration in 15 Minutes a Week: Table Basics (Part 1) - Page 3
August 22, 2002
There are a few things I feel are important to point out about the data types in the preceding table. First, back in "Database Creation Basics" I said the maximum size a row could be was 8060 bytes. In order to calculate how big each row of your table will be and see if it is under 8060 bytes, typically you can just add up the size of each column to get the answer. Although it is true that a row can't span more than one page, there is a way to get around the 8060 byte limitation - the text, ntext, and image data types (aka "BLOB" or Binary Large OBject ). By default, the text, ntext, and image data types don't store their data in the row like the other fields of that row do. Instead, they store a 16 byte pointer that redirects SQL Server to another 8 K page(s) where the data can be found. By having only a pointer to the data stored in the row, you can get around the 8060 byte limitation. But be warned, the text, ntext, and image data types have some limitations that we will see over the next few articles. One downside is that since SQL Server must retrieve two or more pages (verses only one page for rows without any text/image data) using the text, ntext, and image data types slows SQL Server down. One good practice is only to use/retrieve from the text, ntext, and image data types when it is absolutely necessary.
Speaking of variables that have a non 'n' version and a 'n' version, let's talk about non-Unicode strings vs. Unicode strings. If you remember back to "Installation Basics" one of the options when installing SQL Server was to select a collation that included the selection of an ASCII code page. The non-Unicode data types -- char, varchar, and text -- store the ASCII values from the code page for each character. Because there are only 256 ASCII values per code page, the space required to store an ASCII value is only 1 byte (2^8). But what if you need to have multiple language support or there is not an ASCII code page that has all the characters you need? Well, by using the Unicode data types nchar, nvarchar, and ntext, you can store multiple languages and have access to over 256 characters. In order to achieve this, unlike the non-Unicode data types, the Unicode data types are made up of a 2 byte number for each character that gives a possible 65,536 characters (2^16) -- more than enough to include all the characters you would need for multilingual support. (Check out http://www.unicode.org/ for more info on Unicode)
The next point I would like to make about data types is fixed-length and the variable-length data types. If you use the char or nchar data types, no matter how many characters you enter, the field will always use n number (or 2*n for nchar) of bytes. For example, if I set the length (i.e. n) of a column to 10 and had three rows: 'Dan', 'Mike', and 'Jason' -- the space needed to store those three rows would be 30 bytes even though there is only 9 characters worth of data. On the other hand, if I used the variable-length version of char or varchar, the space needed to store the three example rows is only 9 bytes -- a 21 byte savings. Now that may not seem like much, but multiply that by several columns of data and 100,000 rows and it adds up quick. So it may seem like varchar and nvarchar are the way to go all the time, but they also have a downside - the time needed to deal with the variable length nature of varchar and nvarchar slows SQL Server down. As a rule of thumb, it is best to use char or nchar whenever the strings will be about the same size from row to row. Whereas if the length of the strings varies greatly between rows, it is typically best to use varchar or nvarchar. Now for the three row example I gave it may be best just to leave it as char -- looking at a larger sample from that table may show more names are greater than 5 characters or that the speed increase by using a fixed-length data type outweighs the extra disk space that gets used. Like I said earlier...when working with SQL Server most things are a balancing act; you just have to see what gives you the best performance for your situation.
Finally, there are some more generalized points about data types that are important. When you choose a data type the goal should be to choose one that is as small in size as possible, but one that will still accept all the values that may be possible over the lifetime of the database. A good example of this is selecting the size of integer to use. If you know you are only going to store a yes/no answer, bit is the perfect size and anything larger, such as smallint or int, will only waste storage space. Another example of wasting space would be using datetime when you only need the range and accuracy of smalldatetime. By selecting the smallest data type possible, you not only save disk space but you also reduce the number of pages that makeup the table -- which in turn lowers the time needed to run a query (i.e. SQL Server has less pages to retrieve from the hard disk and scan).
You should also try and let your data types do as much work for you as possible when dealing with data integrity. A good example of this is to use datetime or smalldatetime to store date information - those data types may take up more space and use more processing power then making your own date from a 6 character string, but in the long run using the proper data type for the job will save you time.