Conforming to Table Design Rules
Designing tables for relational
databases follows a formalized procedure called normalization. Dr. Codd described the complete normalization process
in his 1972 paper Further Normalization of the Data Base Relational Model.
This paper isnt an easy read; its steeped in the language of set theory and
relational algebra. The sections that follow explain in common English the
application of the normalization process to Accesss Northwind database.
You normalize tables in a series
of steps called normal forms.
Applying the normalization process is necessary to move spreadsheet-style data
to relational tables. You also employ the normalization rules when designing a
new database or analyzing existing databases. In specific cases, however, you
might need to depart from strict adherence to normalization rules to retain a
history of data values that change over time or to improve performance of a
First Normal Form
First normal form
requires tables to be flat and have no repeating or potentially repeating
fields or groups of fields. A flat table is one in which every record has the same number of
fields. In addition, a single field cannot contain multiple data values.
Repeating fields must be moved to a related table. The first normal form is the
most important of the normalization steps. If all your tables dont meet the
rules of first normal form, you are in big trouble.
Northwinds Customers and
Suppliers tables violate the no repeating fields rule. If a customer or supplier
has more than one person involved in the ordering process, which is likely, the
table would need repeating pairs of fields with different names, such as
ContactName2 and ContactTitle2 or the like. To conform the Customers and
Suppliers tables to first normal form, you must create two new
tablesCustPers(sonel) and SuppPers(sonel), for exampleto hold contact
records. Including contact names in the Customers and Suppliers tables also
violates third normal form, which is the subject of the later Third Normal
The ContactName field also
violates the rule against multiple data values in a single field by combining
given and family names. This isnt a serious violation of first normal form,
but its a good database design practice always to identify persons by given
and family names in separate fields. When you create the new CustPers and
SuppPers tables, separate the ContactName field into two fields, such as LastName
and GivenName, which can include initials. You can then use a code similar to
that for CustomerID for the ContactID field. For this example, the ContactID
code is the first character of GivenName and the first four characters of
LastName. Alternatively, you could assign an AutoNumber value to ContactID.
Figure 4.5 shows the first 19 of
the 91 records of the CustPers table generated from the Customers table. The
CustomerID field is required for a many-to-one relationship with the Customers
table. Additional fields, such as Suffix, TitleOfCourtesy, Email(Address),
Phone, and Fax, make the individual contact records more useful for creating
mailing lists and integration with other applications, such as Microsoft
You extract data for records of
the CustPers table from the ContactName and ContactTitle fields of the
Customers table. Separating given and last names simplifies generating a
ContactID code to identify each record.
- For more
information on importing from Excel, see Importing and Linking Spreadsheet
Files, p. XXX (Chapter 8).
- To learn
how to use Access action queries, see Creating Action Queries to Append
Records to a Table, p. XXX (Chapter 13).
Figure 4.6 shows the
Relationships window with the CustPers and SuppPers tables added to the
Northwind database and their many-to-one relationships with the Customers and
Suppliers tables, respectively.
dont need to retype the data to populate the CustPers and SuppPers tables. You
can use Access to import the data from an Excel worksheet or text file, or use
Access action queries (append and update) to handle this chore.
The Relationships window
displays the many-to-one relationships between the Customers and CustPers
tables and the Suppliers and SuppPers tables.
Second Normal Form
Second normal form
requires that data in all non-key fields be fully dependent on the value of a
primary key. The objective of second normal form is to avoid data redundancy in
Only Northwinds Order Details
linking table (see Figure 4.7) has a composite primary key (OrderID +
ProductID). The UnitPrice field appears to violate the second normal form,
because UnitPrice is a field of the Products table. UnitPrice values added to
the Order Details table are dependent on the ProductID component of the
composite primary key and not the OrderID component, so UnitPrice data is not fully dependent on the primary key. On first glance, the UnitPrice
field appears to be redundant data. If you change the unit price of a product,
it would appear that you would need to alter the UnitPrice value in every Order
Details record for the product.
The Order Details linking table has a composite primary key
consisting of the OrderID and ProductID fields.The
Order Details table is an example of a situation in which you must retain what appears to be redundant information to
maintain the integrity of historical data. Prices of products vary over time,
so the price of a particular product is likely to change for orders placed on
different dates. If the price of a product changes between the order and
shipping (invoice) dates, the invoice reflects a different amount than the
order. Despite the Prices are subject to change without notice boilerplate,
customers become incensed if the invoice price is greater than the order price.
Eliminating the UnitPrice field
from the Order Details table and looking up its value from the current price in
the Products table also can cause accounting errors and distortion of
historical reports based on bookings and sales data. Removing the UnitPrice
data also violates the rules for the fifth normal form, explained later in this
Third Normal Form
Third normal form
requires that data in all non-key fields of the table be fully dependent on the
value of the primary key and describe only the object that the table
represents. In other words, make sure that the table doesnt include non-key
fields that relate to some other object or process and includes non-key fields
for descriptive data that isnt contained in another related table.
As mentioned in the First
Normal Form section, including contact information in the Customers and
Products table violates third normal form rules. Contacts are persons, not
customer or supplier organizations, and deserve their own related table that
has attributes related to individuals.
Other examples of a common third
normal form violation are the UnitsInStock and UnitsOnOrder fields of the
Products table (see Figure 4.8). These fields arent fully dependent on the
primary key value, nor do they describe the object; they describe how many of
the product you have now and how many you might have if the supplier decides to
ship your latest order. In a production order entry database, these values vary
over time and must be updated for each sale of the product, each purchase order
issued to the products supplier, and each receipt of the product. Purchases,
receipts, and invoices tables are the most common source of the data on which
the calculations are based.
The Products tables
UnitsInStock and UnitsOnOrder values must be calculated from data in tables
that record purchases, receipts, and shipments of products.
Including UnitsInStock and
UnitsOnOrder fields isnt a serious violation of the normalization rules, and its
not uncommon for product-based tables of order entry databases to include
calculated values. The problem with calculated inventory values is the need to
process a potentially large number of records in other tables to obtain an
accurate current value.
Fourth Normal Form
Fourth normal form
requires that tables not contain fields for two or more independent,
multivalued facts. Loosely translated, this rule requires splitting tables that
consist of lists of independent attributes. The Northwind and Oakmont databases
dont have an example of a fourth normal form violation, so the following is a
If youre designing an order entry
database, make sure to take into account committed inventory. Committed inventory
consists of products in stock or en route from suppliers for which you have
unfulfilled orders. If you decide to include inventory information in a
products table, add a UnitsCommitted field.
One of the objectives of Human
Resources departments is to match employee job skills with job openings. A
multinational organization is likely to require a combination of specific job
skills and language fluency for a particular assignment. A table of job skill
types and levels exists with entries such as JP3 for Java
ProgrammerIntermediate, as well as language/fluency with entries such as TE5
for TeluguVery Fluent. Therefore, the HR department constructs an EmplSkillLang
linking table with the following foreign key fields: EmployeeID, SkillID, and
The problem with the linking table
is that job skills and language fluency are independent facts about an
employee. The ability to speak French has nothing to do with an employees
ability to write Java code. Therefore, the HR department must split (decompose)
the three-field table into two two-field linking tables: EmplSkills and
Fifth Normal Form
Fifth normal form
involves further reducing redundancy by creating multiple two-field tables from
tables that have more than two foreign keys. The classic example is identifying
independent sales agents who sell multiple products or categories of products
for different companies. In this case, you have a table with AgentID,
CompanyID, and ProductID or CategoryID. You can reduce redundancyat the risk
of making the database design overly complexby creating three two-field
tables: AgentCompany, CompanyProduct (or CompanyCategory), and AgentProduct (or
AgentCategory). Database developers seldom attempt to normalize designs to
fifth normal form because doing so requires adding many additional small tables
to the database.
AutoNumber primary key values work
well for serially numbered documents if you dont allow records to be deleted.
Adding a true-false (Boolean) field named Deleted and setting the value to true
is one approach. This technique complicates queries against the tables, so you
might consider moving deleted records to another table. Doing this lets you
write a query to reconstruct all records for audit purposes.