Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted Oct 31, 2006

Relational Databases 101 - Page 3

By DatabaseJournal.com Staff

Choosing the Right Data Type

When designing databases in the 1960s and 1970s, I was taught to be especially careful of how much space each data element consumed. Since hard disks were tiny by today's standards (the IBM 360 125 came with 7.25Mb to 100Mb drives),13 I was hard-pressed to minimize the amount of data stored in each "record." I economized by "coding" whenever and wherever I could. For example, a single column (byte) might contain several different types of data, depending on the value to be stored. When SQL Server and other relational databases were introduced, disk space was still expensive, but not nearly as much as in the mainframe days. However, more experienced database architects still choose column widths based on past experience and with the knowledge that more data means poorer performance.


13 See www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP3125.html


Unicode vs. ANSI

In situations where you need to store data in an "international" character set, whose characters are not supported by the ANSI set, you'll have to define your columns (and string literals) as Unicode. If you take this option, SQL Server stores 16 bits for each character instead of 8. It means the same four-character entry requires 4 bytes in ANSI and 16 bytes in Unicode columns. Just remember to prefix your string literals with "N", as in N'Fred', when building Unicode expressions—Visual Studio tools and wizards do this for you if they generate the query. There is another aspect to Unicode that might surprise you. When you define a column as nvarchar, you specify a maximum length, as shown in Figure 3.8.

Figure 3.8
Creating a table with a Unicode column.

This DDL code allocates 50 bytes of space in the data row to the Author's name. However, this also means that the name must be no longer than 25 (16-bit) characters.

Char vs. VarChar

I've heard the debates over use of fixed-length datatypes (like char and nchar) over variable-length types (like varchar and nvarchar). In my practices, I rarely use the fixed-width types because they're problematic in a number of respects. These types are fine for columns whose data is always the same number of characters, but if you slip and provide a value that's shorter (or longer) than the defined size, SQL Server either pads the remaining space or truncates the data (often without notice). You'll also find that it's tough to create expressions against fixed-length columns unless you match the length of both operands. For example, if your fixed-length column can contain four characters, you'll have to write an expression that has exactly four characters, or an equality expression will always return False.

IF MyFixedCol = 'Fred'

This returns TRUE if MyFixedCol contains "Fred".

IF MyFixedCol = 'Fred '

But this returns FALSE.

For this reason (and others), I prefer to use variable-length types. They don't consume much extra space (if any) and when the data length varies, this approach can actually save space. When defining variable-length character columns, you specify the maximum amount of space to reserve for the column. This does not preallocate this space—it simply sets an upper limit. With SQL Server 2005, you can now define a varchar(max) or nvarchar(max) column that (like the TEXT datatype) can store up to 231 bytes and Unicode 230 bytes.

Decimal vs. Floating Point

When you record a money value in the database, it's best to understand the nature of the values you intend to store—especially the precision. For those of you that took computer science in school, you know that it's not possible to store some values in binary. For example, [1/3] is stored as .3333 (with a never-ending list of "3"s.) While the value might be close, if you add [1/3] + [1/3] + [1/3], you'll get .9999—you've lost some precision. Sure, with rounding, the result is returned as 1, but in some cases, you aren't permitted to round.


IMHO - In the early days of computing, clever programmers were able to strip off the extra precision (values less than a penny) and salt it away in another account. By the end of the week, they had accumulated a tidy sum—especially when millions of dollars were changing hands.


When you store a money value, be sure everyone knows the currency on which this value is based. This can help you from making a mistake when bidding on a project in the U.K., where the dollar is worth (at today's rate) about £0.529269. You might consider using a CLR-based user-defined type to keep the currency type stored with the value—especially if a single column can hold values from more than one currency.

The decimal datatype is listed (as shown in Table 1.1) under "Exact Numerics". That is, it's designed to hold an exact value. When you declare a decimal or numeric (they are equivalent), you also can declare the precision and scale (it defaults to 18). The precision is the maximum total number of decimal digits that can be stored—including the values on either side of the decimal point. To store a value of 1234.1234, you would need a precision of 8.

The scale indicates the maximum number of decimal digits that can be stored to the right of the decimal point—this must be a value from 0 to the defined precision. The default scale is 0, so unless you define a scale, your value will be stored as a whole number (without a decimal portion). You won't be able to define a scale unless you define a precision as well. For example (as shown in Figure 3.9), to define a column with a precision of 10 and four decimal places, you would code:

Figure 3.9
Declaring a decimal column with specific precision and scale.

Working with Imprecise Numbers

When working with scientific data where you need more precision but not 100% accuracy (which sounds a bit strange), you can choose the approximate number data types. Sure, some numbers can be expressed exactly, but others can't due to binary round-off. In the case of the float datatype, you can define the precision and storage size by providing a value that determines the number of bits used to store the mantissa14 of the floating point number (in scientific notation). If you supply a value between 1 and 24, the float's precision is set to 7, and it takes 4 bytes to store the value. If you provide a value between 25 and 53, the float's precision is set to 15, and it takes 8 bytes to store the value. The default is 53. Note that SQL Server 2005 resets the mantissa setting to either 1 or 53, based on the value you supply.


14 Mantissa: the fractional part of a floating-point number.


Table 3.1 SQL Server Datatypes and Their Precision

 

Datatype

Bytes

 

Exact Numerics

 

 

These values are stored so the value stored is expressed exactly—they are not subject to binary round-off.

Integers

bigint

8

Integer (whole number) data from –2^63 (–9223372036854775808) through 2^63–1 (9223372036854775807).

 

int

4

Integer r(whole number) data from –2^31 (–2,147,483,648) through 2^31 – 1 (2,147,483,647).

 

smallint

2

Integer data from 2^15 (–32,768) through 2^15 –1 (32,767).

 

tinyint

1

Integer data from 0 through 255.

Bit

bit

1

Integer data with either a 1 (True), 0 (False), or NULL value.

Decimal

decimal

5–17

Fixed precision and scale numeric data from –10^38 +1 through 10^38 –1.

 

numeric

 

Functionally equivalent to decimal.

Money

money

4

Monetary data values from –2^63 (–922,337,203,685,477.5808) through 2^63 – 1 (+922,337,203,685,477.5807), with accuracy to a ten-thousandth of a monetary unit.

 

smallmoney

8

Monetary data values from –214,748.3648 through +214,748.3647, with accuracy to a ten-thousandth of a monetary unit.

Approximate Numerics

 

 

These values are stored in binary and are used when a precise but not 100% accurate value must be stored.

 

float

4–8

Floating precision number data from –1.79E + 308 through 1.79E + 308.

 

doubleprecision

8

Equivalent to float(53) (8 bytes).

 

real

4

Floating precision number data from –3.40E + 38 through 3.40E + 38.

Dates

datetime

8

Date and time data from January 1, 1753, through December 31, 9999, with an accuracy of three-hundredths of a second, or 3.33 milliseconds.

 

smalldatetime

4

Date and time data from January 1, 1900, through June 6, 2079, with an accuracy of 1 minute.

ANSI Character Strings

 

 

These values are stored as strings of characters in non-Unicode (ANSI) encoding (8-bits/character).

 

char

N

Fixed-length non-Unicode character data with a maximum length of 8,000 characters.

 

varchar

N

Variable-length non-Unicode data with a maximum of 8,000 characters.

 

varchar(max)

N

Variable-length non-Unicode data with a maximum length of 2^31 – 1 (2,147,483,647) characters.

 

text

N

Variable-length non-Unicode data with a maximum length of 2^31 – 1 (2,147,483,647) characters.

Unicode Character Strings

 

 

These values are stored in Unicode (16-bits/character).

 

nchar

N

Fixed-length Unicode data with a maximum length of 4,000 characters; 16 bits stored for each character.

 

nvarchar

N

Variable-length Unicode data with a maximum length of 4,000 characters.

 

sysname

128

System-supplied user-defined data type that is functionally equivalent to nvarchar (128) and is used to reference database object names.

 

nvarchar(max)

 

Variable-length Unicode data with a maximum length of 2^30 – 1 (1,073,741,823) characters.

 

ntext

N

Variable-length Unicode data with a maximum length of 2^30 – 1 (1,073,741,823) characters.

Binary Strings

 

 

These values are stored in binary with any attempt to encode them.

 

binary

N

Fixed-length binary data with a maximum length of 8,000 bytes.

 

varbinary

N

Variable-length binary data with a maximum length of 8,000 bytes.

 

varbinary(max)

 

Variable-length binary data with a maximum length of 2^31 – 1 (2,147,483,647) bytes.

 

image

N

Variable-length binary data with a maximum length of 2^31 – 1 (2,147,483,647) bytes.

Other Types

 

 

 

 

cursor

A reference to a server-side CURSOR.

 

sql_variant

N

A data type that stores values of various SQL Server-supported data types, except text, ntext, timestamp, and sql_variant.

 

table

A special data type used to store a rowset for later processing.

 

timestamp

8

A database-wide unique number that gets updated every time a row gets updated.

 

uniqueidentifier

16

A globally unique identifier (GUID).

 

xml

N

Names an XML schema collection. Can store up to 2GB of data.

Using the xml Datatype

For the first time, SQL Server 2005 introduces the new xml datatype. This means you're going to be able to store XML data in your table's column(s). Because xml is a "real" built-in type, you'll be able to use it when creating a table as a variable type, a parameter type, or a function return type. You'll also be able to use it in CAST or CONVERT. That said, I need to discuss where it makes sense to use xml typed data columns or xml typed arguments. One interesting use would permit you to pass lists of values to be used in an IN expression. Yes, you would need to write a function to convert this to a table-type variable.

Using the sql_variant Datatype

One of SQL Server 2000's innovations was "lifted" from Visual Basic—the "variant." The sql_variant datatype is unusual, in that it's designed to "morph" itself to most (non-BLOB) types. This means when you define a column as sql_variant, it can contain an integer (of any size), a string, a float, money, or even an xml structure. The sql_variant column value does not take on a type until you assign a value to it. I suggest you check out BOL for the rules and regulations involving this unique type.

Summary

This chapter gives you a kick-start on designing relational databases that can perform better, be easier to maintain, and be more successful. That's usually an unspoken goal of any database application project. Unless it's successful, you'll either be back working on it when you should be home relaxing or be out looking for another job—without a good recommendation from your last employer. I talked about both formal rules and informal suggestions to normalize your database. Understand that few of these rules are cast in stone, but until you fully understand them, the databases you (and your team) design and implement, populate and test, and polish and deploy won't keep bread on the table.


© Copyright Pearson Education. All rights reserved.

Buy this book
Hitchhiker's Guide to Visual Studio and SQL Server : Best Practice Architectures and Examples
By William R. Vaughn with Peter Blackburn
Published by Addison-Wesley Professional
Series: Microsoft Windows Server System Series
ISBN: 0321243625; Published: 11/8/2006; Copyright 2007; Pages: T; Edition: 7
www.awprofessional.com/title/0321243625



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM