That about covers it for the various
data types. There are two data types I intentionally left out (cursor and
table) as they are really not applicable here because you
can't have a column that is made up of those data types.
There are a few
things I feel are important to point out about the data
types in the preceding table. First, back in "Database Creation
Basics" I said the maximum size a row could be was
8060
bytes. In order to calculate how big each row of your table
will be and see if it is under 8060 bytes, typically you can just add up the size of each column to get the answer.
Although it is true that a row can't span more than one
page, there is a way to get around the 8060 byte
limitation - the text,
ntext, and image data types (aka "BLOB" or Binary Large
OBject ). By default,
the
text, ntext, and image
data types don't store their data in the row like the other fields of that row
do. Instead, they store a 16 byte
pointer that redirects SQL Server to another 8 K page(s)
where the data can be found. By having only a pointer to
the data stored in the row, you can get around the 8060 byte limitation.
But be warned, the text, ntext, and image data types have
some limitations that we will see over the next few
articles. One downside is that since SQL Server must retrieve two
or more pages (verses only one page for rows without any
text/image data) using the text, ntext, and image data
types slows SQL Server down. One good practice is only to
use/retrieve from the text, ntext, and image data types
when it is absolutely necessary.
Speaking of variables that have
a non 'n' version and a 'n' version, let's talk about
non-Unicode strings vs. Unicode strings. If you remember
back to "Installation Basics" one of the options when
installing SQL Server was to select a collation that
included the selection of an ASCII code page. The
non-Unicode data types -- char, varchar, and text -- store the
ASCII values from the code page for each character. Because
there are only 256 ASCII values per code page, the space
required to store an ASCII value is only 1 byte (2^8).
But what if you need to have multiple language support or
there is not an ASCII code page that has all the characters
you need? Well, by using the Unicode data types nchar,
nvarchar, and ntext, you can store multiple languages and
have access to over 256 characters. In order to achieve
this, unlike the non-Unicode data types, the Unicode data
types are made up of a 2 byte number for each character
that gives a possible 65,536 characters (2^16) -- more than
enough to include all the characters you would need for
multilingual support. (Check out
http://www.unicode.org/
for more info on Unicode)
The next point I would like to
make about data types is fixed-length and the
variable-length data types. If you use the char or nchar
data types, no matter how many characters you enter, the
field will always use n number (or 2*n for nchar) of
bytes. For example, if I set the length (i.e. n) of a
column to 10 and had three rows: 'Dan', 'Mike', and 'Jason'
-- the space needed to store those three rows would be 30
bytes even though there is only 9 characters worth of data.
On the other hand, if I used the variable-length version of
char or varchar, the space needed to store the three
example rows is only 9 bytes -- a 21 byte savings. Now that
may not seem like much, but multiply that by several
columns of data and 100,000 rows and it adds up quick. So it
may seem like varchar and nvarchar are the way to go all
the time, but they also have a downside - the time needed
to deal with the variable length nature of varchar and
nvarchar slows SQL Server down. As a rule of thumb, it is
best to use char or nchar whenever the strings will be
about the same size from row to row. Whereas if the length of the strings varies
greatly between rows, it is typically best to use varchar
or nvarchar. Now for the three row example I gave
it may be best just to leave it as char -- looking at a
larger sample from that table may show more names are
greater than 5 characters or that the speed increase by
using a fixed-length data type outweighs the extra disk
space that gets used. Like I said earlier...when working with SQL
Server most things are a balancing act; you just have to
see what gives you the best performance for your situation.
Finally, there are some more
generalized points about data types that are important.
When you choose a data type the goal should be to choose
one that is as small in size as possible, but one that will
still accept all the values that may be possible over the
lifetime of the database. A good example of this is
selecting the size of integer to use. If you know you are
only going to store a yes/no answer, bit is the perfect
size and anything larger, such as smallint or int, will
only waste storage space. Another example of wasting space
would be using datetime when you only need the range and
accuracy of smalldatetime. By selecting the smallest data
type possible, you not only save disk space but you also
reduce the number of pages that makeup the table -- which in
turn lowers the time needed to run a query (i.e. SQL Server
has less pages to retrieve from the hard disk and scan).
You should also try and let your
data types do as much work for you as possible when
dealing with data integrity. A good example of this is to
use datetime or smalldatetime to store date information -
those data types may take up more space and use more
processing power then making your own date from a 6
character string, but in the long run using the proper data
type for the job will save you time.
Page 4: Creating Tables
»
See All Articles by Columnist Michael Aubert