Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Tips Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS Access
SQL Scripts & Samples
» Database Forum
» Slideshows
Free Newsletters:

News Via RSS Feed

Database Journal |DBA Support |SQLCourse |SQLCourse2

Featured Database Articles

SQL etc

Posted May 24, 2001

Database Normalization for the Average Jane/Joe

By Jeff Putnam

If you're a DBA who has never managed the data itself, or a new database architect, you'll need to know the ins and outs of normalization. For those who aren't comfortable with the concept, NORMALIZATION is the process of structuring tables for maximum efficiency of joins, while assuring minimum duplication of data and maximum data integrity.

Why Normalize? Yes, building a database that isn't normalized is quicker. If you're the one in charge of maintaining it, though, you'll pay later: explosion in the size of your database, convoluted coding, duplication of data, and even contradictory data isn't uncommon. There are often good reasons NOT to normalize, too: If your database is largely a reporting platform, you may want to persist duplicated data to avoid unnecessary joins. Quite frankly, OLAP (on-line analytical processing) is just a big excuse not to normalize. For the purposes of this article, let's assume you're designing a production OLTP (on-line transactional processing) system, with users inserting, updating, and deleting records willy-nilly.

Okay, you may have heard of the "normal forms". The forms are a set of standards that state how well a database is normalized. They aren't rules, nobody will come to your office and slap you if you violate them. Different developers go to different lengths to implement them--most folks I know don't go above 3rd normal form on a regular basis. Here are the quick and dirty definitions in English.

1st Normal Form (1NF): All 1NF says is that you don't repeat groups of data as separate fields. An example of this is having a customer table with three phone number fields (home, work, and fax). To avoid violating 1NF, you'd need a separate phone numbers table. Once again, sometimes it makes sense to ignore this rule, even in an OLTP scenario.

2nd Normal Form (2NF): Okay, now each column in the table must be dependant on the ENTIRE key. Using the example above, you wouldn't want to keep the customer name in the phone numbers table; since there would be 0-3 records for each customers, you could say that the number itself is dependant on the key (i.e., there is just one record for each key/phone combination) but the customer name is NOT dependant (the key/name combination has the possibility of duplication). Basically, the customer name applies to lots of different kinds of records (invoices, addresses, etc.) and therefore should be stored separately and related to each different record using a key. Now, if you add "Phone Type" to the key, you're not in violation.

3rd Normal Form (3NF): This one can be confusing, but its result makes sense. Each column must depend DIRECTLY on the key. You could think of this as a logical extension of 2NF; while the example above fits 2NF, it doesn't fit 3NF because the key in the phone numbers table doesn't identify the customer in every instance (such as if you were looking for customer orders). To avoid violating 3NF, you'd need to put customer name in the Customers table and then key to the phone numbers table based on CustomerID.

4th and 5th Normal Form: Pretend as if they don't exist. They're really only useful for late night discussions when you've had too many tequila shooters and cheese fries. I have personally had nightmares about them. Repeat this mantra: "High order Normalization is a tool of the Devil".

As a general rule, you should use non-meaningful key values to avoid having to change them when the data in the row changes.

Comments and suggestions for future articles are gratefully accepted. I will also post corrections if someone points out a mistake.

SQL etc Archives

Comment and Contribute


(Maximum characters: 1200). You have characters left.



Latest Forum Threads
SQL etc Forum
Topic By Replies Updated
MySQL rollback UAL225 0 August 21st, 09:56 PM
Complex Search Query Galway 0 May 20th, 10:04 PM
change collation at once supercain 2 May 15th, 06:18 AM
SQL Features, tools and utilities question Neomite 1 April 10th, 09:13 AM