So, now our data can go in table format, but there are still
some problems with it. We store the information that code 1023
refers to the Madagascar travel site 3 times! Besides the waste
of space, there is another serious problem. Look carefully at
the data below.
employee_project table
| Project number |
Project name |
Employee number |
Employee name |
Rate category |
Hourly rate |
| 1023 |
Madagascar travel site |
11 |
Vincent Radebe |
A |
$60 |
| 1023 |
Madagascar travel site |
12 |
Pauline James |
B |
$50 |
| 1023 |
Madagascat travel site |
16 |
Charles Ramoraz |
C |
$40 |
| 1056 |
Online estate agency |
11 |
Vincent Radebe |
A |
$60 |
| 1056 |
Online estate agency |
17 |
Monique Williams |
B |
$50 |
Did you notice anything strange in the data above? Congratulations
if you did! Madagascar is misspelt in the 3rd record. Now imagine
trying to spot this error in a table with thousands of records!
By using the structure above, the chances of the data being
corrupted increases drastically.
The solution is simply to take out the duplication. What we are
doing formally is looking for partial dependencies, ie fields that
are dependent on a part of a key, and not the entire key. Since
both project number and employee number make up the key, we look
for fields that are dependent only on project number, or on
employee number.
We identify two fields. Project name is dependent on project
number only (employee_number is irrelevant in determining project
name), and the same applies to employee name, hourly rate and
rate category, which are dependent on employee number. So, we
take out these fields, as follows:
employee_project table
| Project number |
Employee number |
| 1023 |
11 |
| 1023 |
12 |
| 1023 |
16 |
| 1056 |
11 |
| 1056 |
17 |
Clearly we can't simply take out the data and leave it out of
our database. We take it out, and put it into a new table,
consisting of the field that has the partial dependency, and
the field it is dependent on. So, we identified employee name,
hourly rate and rate category as being dependent on employee
number. The new table will consist of employee number as a key,
and employee name, rate category and hourly rate, as follows:
Employee table
| Employee number |
Employee name |
Rate category |
Hourly rate |
| 11 |
Vincent Radebe |
A |
$60 |
| 12 |
Pauline James |
B |
$50 |
| 16 |
Charles Ramoraz |
C |
$40 |
| 17 |
Monique Williams |
B |
$50 |
And the same for the project data.
Project table
| Project number |
Project name |
| 1023 |
Madagascar travel site |
| 1056 |
Online estate agency |
Note the reduction of duplication. The text "Madagascar travel
site" is stored once only, not for each occurrence of an employee
working on that project. The link is made through the key, the
project number. Obviously there is no way to remove the
duplication of this number without losing the relation altogether,
but it is far more efficient storing a short number repeatedly,
than a large piece of text.
Database Normalization: Part 3
»
See All Articles by Columnist Ian Gilfillan