Let's run again through the example we've just done, this time
without the data tables to guide us. After all, when you're
designing a system, you usually won't have test data available at
this stage. The tables were there to show you the consequences of
storing data in unnormalized tables, but without them we can
focus on dependency issues, which is the key to database
normalization.
In the beginning, the data structure we had was as follows:
Project number
Project name
1-n Employee numbers (1-n indicates that there are many occurrences
of this field - it is a repeating group)
1-n Employee names
1-n Rate categories
1-n Hourly rates
So, to begin the normalization process, we start by moving from
zero normal form to 1st normal form.
| The definition of 1st normal form |
| there are no repeating groups |
| all the key attributes are defined |
| all attributes are dependent on the primary key |
So far, we have no keys, and there are repeating groups. So we
remove the repeating groups, and define the primary key, and are
left with the following:
Employee project table
Project number - primary key
Project name
Employee number - primary key
Employee name
Rate category
Hourly rate
This table is in 1st normal form.
| A table is in 2nd normal form if |
| it's in 1st normal form |
| it includes no partial dependencies (where an attribute is
dependent on only a part of a primary key). |
So, we go through all the fields. Project name is only dependent
on Project number. Employee name, Rate category and Hourly rate
are dependent only on Employee number. So we remove them, and
place these fields in a separate table, with the key being that
part of the original key they are dependent on. So, we are left
with the following 3 tables:
Employee project table
Project number - primary key
Employee number - primary key
Employee table
Employee number - primary key
Employee name
Rate category
Hourly rate
Project table
Project number - primary key
Project name
The table is now in 2nd normal form. Is it in 3rd normal form?
| The definition of 3rd normal form |
| It's in 2nd normal form |
| It contains no transitive dependencies (where a non-key attribute
is dependent on another non-key attribute). |
We can narrow our search down to the Employee table, which is the
only one with more than one non-key attribute. Employee name is
not dependent on either Rate category or Hourly rate, the same
applies to Rate category, but Hourly rate is dependent on Rate
category. So, as before, we remove it, placing it in it's
own table, with the attribute it was dependent on as key,
as follows:
Employee project table
Project number - primary key
Employee number - primary key
Employee table
Employee number - primary key
Employee name
Rate Category
Rate table
Rate category - primary key
Hourly rate
Project table
Project number - primary key
Project name
These tables are all now in 3rd normal form, and ready to be
implemented. There are other normal forms - Boyce-Codd normal form,
and 4th normal form, but these are very rarely used for business
applications. In most cases, tables in 3rd normal form are already
in these normal forms anyway.