Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Tips Database Forum Rss Feed

» Database Journal Home
» Database Articles
» Database Tutorials
MS Access
SQL Scripts & Samples
» Database Forum
» Slideshows
Free Newsletters:

News Via RSS Feed

Rss Feed

Database Journal |DBA Support |SQLCourse |SQLCourse2

Featured Database Articles

SQL etc

Posted Mar 22, 2002

Database Normalization - Page 2

By Ian Gilfillan

So, now our data can go in table format, but there are still some problems with it. We store the information that code 1023 refers to the Madagascar travel site 3 times! Besides the waste of space, there is another serious problem. Look carefully at the data below.

employee_project table

Project number Project name Employee number Employee name Rate category Hourly rate
1023 Madagascar travel site 11 Vincent Radebe A $60
1023 Madagascar travel site 12 Pauline James B $50
1023 Madagascat travel site 16 Charles Ramoraz C $40
1056 Online estate agency 11 Vincent Radebe A $60
1056 Online estate agency 17 Monique Williams B $50

Did you notice anything strange in the data above? Congratulations if you did! Madagascar is misspelt in the 3rd record. Now imagine trying to spot this error in a table with thousands of records! By using the structure above, the chances of the data being corrupted increases drastically.

The solution is simply to take out the duplication. What we are doing formally is looking for partial dependencies, ie fields that are dependent on a part of a key, and not the entire key. Since both project number and employee number make up the key, we look for fields that are dependent only on project number, or on employee number.

We identify two fields. Project name is dependent on project number only (employee_number is irrelevant in determining project name), and the same applies to employee name, hourly rate and rate category, which are dependent on employee number. So, we take out these fields, as follows:

employee_project table

Project number Employee number
1023 11
1023 12
1023 16
1056 11
1056 17

Clearly we can't simply take out the data and leave it out of our database. We take it out, and put it into a new table, consisting of the field that has the partial dependency, and the field it is dependent on. So, we identified employee name, hourly rate and rate category as being dependent on employee number. The new table will consist of employee number as a key, and employee name, rate category and hourly rate, as follows:

Employee table

Employee number Employee name Rate category Hourly rate
11 Vincent Radebe A $60
12 Pauline James B $50
16 Charles Ramoraz C $40
17 Monique Williams B $50

And the same for the project data.

Project table

Project number Project name
1023 Madagascar travel site
1056 Online estate agency

Note the reduction of duplication. The text "Madagascar travel site" is stored once only, not for each occurrence of an employee working on that project. The link is made through the key, the project number. Obviously there is no way to remove the duplication of this number without losing the relation altogether, but it is far more efficient storing a short number repeatedly, than a large piece of text.

Database Normalization: Part 3

 » See All Articles by Columnist Ian Gilfillan

SQL etc Archives

Latest Forum Threads
SQL etc Forum
Topic By Replies Updated
MySQL rollback UAL225 0 August 21st, 09:56 PM
Complex Search Query Galway 0 May 20th, 10:04 PM
change collation at once supercain 2 May 15th, 06:18 AM
SQL Features, tools and utilities question Neomite 1 April 10th, 09:13 AM