Conforming to Table Design Rules
Designing tables for relational databases follows a formalized procedure called normalization. Dr. Codd described the complete normalization process in his 1972 paper "Further Normalization of the Data Base Relational Model." This paper isn't an easy read; it's steeped in the language of set theory and relational algebra. The sections that follow explain in common English the application of the normalization process to Access's Northwind database.
You normalize tables in a series of steps called normal forms. Applying the normalization process is necessary to move spreadsheet-style data to relational tables. You also employ the normalization rules when designing a new database or analyzing existing databases. In specific cases, however, you might need to depart from strict adherence to normalization rules to retain a history of data values that change over time or to improve performance of a large database.
First Normal Form
First normal form requires tables to be flat and have no repeating or potentially repeating fields or groups of fields. A flat table is one in which every record has the same number of fields. In addition, a single field cannot contain multiple data values. Repeating fields must be moved to a related table. The first normal form is the most important of the normalization steps. If all your tables don't meet the rules of first normal form, you are in big trouble.
Northwind's Customers and Suppliers tables violate the no repeating fields rule. If a customer or supplier has more than one person involved in the ordering process, which is likely, the table would need repeating pairs of fields with different names, such as ContactName2 and ContactTitle2 or the like. To conform the Customers and Suppliers tables to first normal form, you must create two new tablesCustPers(sonel) and SuppPers(sonel), for exampleto hold contact records. Including contact names in the Customers and Suppliers tables also violates third normal form, which is the subject of the later "Third
Normal Form" section.
The ContactName field also violates the rule against multiple data values in a single field by combining given and family names. This isn't a serious violation of first normal form, but it's a good database design practice always to identify persons by given and family names in separate fields. When you create the new CustPers and SuppPers tables, separate the ContactName field into two fields, such as LastName and GivenName, which can include initials. You can then use a code similar to that for CustomerID for the ContactID field. For this example, the ContactID code is the first character of GivenName and the first four characters of LastName. Alternatively, you could assign an AutoNumber value to ContactID.
Figure 4.5 shows the first 19 of the 91 records of the CustPers table generated from the Customers table. The CustomerID field is required for a many-to-one relationship with the Customers table. Additional fields, such as Suffix, TitleOfCourtesy, Email(Address), Phone, and Fax, make the individual contact records more useful for creating mailing lists and integration with other applications, such as Microsoft Outlook.
You extract data for records of the CustPers table from the ContactName and ContactTitle fields of the Customers table. Separating given and last names simplifies generating a ContactID code to identify each record.
You don't need to retype the data to populate the CustPers and SuppPers tables. You can use Access to import the data from an Excel worksheet or text file, or use Access action queries (append and update) to handle this chore.
→ For more information on importing from Excel, see "Importing and Linking Spreadsheet Files."
→ To learn how to use Access action queries, see "Creating Action Queries to Append Records to a Table."
Figure 4.6 shows the Relationships window with the CustPers and SuppPers tables added to the Northwind database and their many-to-one relationships with the Customers and Suppliers tables, respectively.
The Relationships window displays the many-to-one relationships between the Customers and CustPers tables and the Suppliers and SuppPers tables.
Second Normal Form
Second normal form requires that data in all nonkey fields be fully dependent on the value of a primary key. The objective of second normal form is to avoid data redundancy in your tables.
Only Northwind's Order Details linking table (see Figure
4.7) has a composite primary key (OrderID + ProductID). The UnitPrice field appears to violate the second normal form, because UnitPrice is a field of the Products table. UnitPrice values added to the Order Details table are dependent on the ProductID component of the composite primary key and not the OrderID component, so UnitPrice data is not fully dependent on the primary key. On first glance, the UnitPrice field appears to be redundant data. If you change the unit price of a product, it would appear that you would need to alter the UnitPrice value in every Order Details record for the product.
The Order Details linking table has a composite primary key consisting of the OrderID and ProductID fields.
The Order Details table is an example of a situation in which you must retain what appears to be redundant information to maintain the integrity of historical data. Prices of products vary over time, so the price of a particular product is likely to change for orders placed on different dates. If the price of a product changes between the order and shipping (invoice) dates, the invoice reflects a different amount than the order. Despite the "Prices are subject to change without notice" boilerplate, customers become incensed if the invoice price is greater than the order price.
Eliminating the UnitPrice field from the Order Details table and looking up its value from the current price in the Products table also can cause accounting errors and distortion of historical reports based on bookings and sales data. Removing the UnitPrice data also violates the rules for the fifth normal form, explained later in this chapter.
Third Normal Form
Third normal form requires that data in all nonkey fields of the table be fully dependent on the value of the primary key and describe only the object that the table represents. In other words, make sure that the table doesn't include nonkey fields that relate to some other object or process and includes nonkey fields for descriptive data that isn't contained in another related table.
As mentioned in the "First Normal Form" section, including contact information in the Customers and Products table violates third normal form rules. Contacts are persons, not customer or supplier organizations, and deserve their own related table that has attributes related to individuals.
Other examples of a common third normal form violation are the UnitsInStock and UnitsOnOrder fields of the Products table (see Figure
4.8). These fields aren't fully dependent on the primary key value, nor do they describe the object; they describe how many of the product you have now and how many you might have if the supplier decides to ship your latest order. In a production order entry database, these values vary over time and must be updated for each sale of the product, each purchase order issued to the product's supplier, and each receipt of the product. Purchases, receipts, and invoices tables are the most common source of the data on which the calculations are based.
The Products table's UnitsInStock and UnitsOnOrder values must be calculated from data in tables that record purchases, receipts, and shipments of products.
Including UnitsInStock and UnitsOnOrder fields isn't a serious violation of the normalization rules, and it's not uncommon for product-based tables of order entry databases to include calculated values. The problem with calculated inventory values is the need to process a potentially large number of records in other tables to obtain an accurate current value.
Tip - If you're designing an order entry database, make sure to take into account committed inventory. Committed inventory consists of products in stock or en route from suppliers for which you have unfulfilled orders. If you decide to include inventory information in a products table, add a UnitsCommitted field.
Fourth Normal Form
Fourth normal form requires that tables not contain fields for two or more independent, multivalued facts. Loosely translated, this rule requires splitting tables that consist of lists of independent attributes. The Northwind and Oakmont databases don't have an example of a fourth normal form violation, so the following is a fabricated example.
One of the objectives of Human Resources departments is to match employee job skills with job openings. A multinational organization is likely to require a combination of specific job skills and language fluency for a particular assignment. A table of job skill types and levels exists with entries such as JP3 for Java ProgrammerIntermediate, as well as language/ fluency with entries such as TE5 for TeluguVery Fluent. Therefore, the HR department constructs an EmplSkillLang linking table with the following foreign key fields: EmployeeID, SkillID, and LanguageID.
The problem with the linking table is that job skills and language fluency are independent facts about an employee. The ability to speak French has nothing to do with an employee's ability to write Java code. Therefore, the HR department must split (decompose) the three-field table into two two-field linking tables: EmplSkills and EmplLangs.
Fifth Normal Form
Fifth normal form involves further reducing redundancy by creating multiple two-field tables from tables that have more than two foreign keys. The classic example is identifying independent sales agents who sell multiple products or categories of products for different companies. In this case, you have a table with AgentID, CompanyID, and ProductID or CategoryID. You can reduce redundancyat the risk of making the database design overly complexby creating three two-field tables: AgentCompany, CompanyProduct (or CompanyCategory), and AgentProduct (or AgentCategory). Database developers seldom attempt to normalize designs to fifth normal form because doing so requires adding many additional small tables to the database.