High Order Normalization: By Popular Demand

June 20, 2001

In this article, I'll discuss the 4th, 5th, and Domain Key normal forms. Based on the e-mails I've received, there seems to be some interest in this. Be warned: the following is not for the faint of heart.

4th Normal Form (4NF or BCNF)--Under 4NF, a record should not have two or more independent multi-valued facts. To use an example that strikes close to home, suppose you have multiple DBAs who each know how to use multiple database systems; they also drink certain beverages.

DBA Database Drink
Jeff SQL Server Coffee
Jeff Access Coffee
Jeff Dbase Coffee

The table above shows the problem. "Database" and "Drink" are independent; if Jeff changes his drink (not likely, but we'll consider it), you have to update all the records. Heaven forbid, if he ADDS a drink, you'll have to add 3 new rows. Not good. To satisfy 4th normal form, you would need to have two tables, each storing one of the facts, like so:

DBA Database
Jeff SQL Server
Jeff Access
Jeff Dbase

If the facts are dependant, on the other hand, the first table fits 4NF (for instance, if Jeff changed his drink depending on what system he was using).

5th Normal Form (5NF)--This is a further generalization of multi-valued facts, just in case you're horribly anal about it. We'll take the case of the DBA again... in this example, suppose that each DBA has a general skill set: Normalization, Data Access, or Database Administration. You might be tempted to use the structure above, and in most cases you'd be right. But in this case, you want to hire DBAs who can use that skill set in any system that supports it. To satisfy 5NF, you'd need three tables:

DBA Database
Jeff SQL Server
Jeff Access
Jeff Dbase

DBA Function
Jeff Normalization

Database Function
SQL Server Normalization
Access Normalization

Note that I have not marked Dbase as having the Normalization function (bring on the nasty e-mails) in order to show how this relationship works.

Domain/Key Normal Form (DKNF)--The most elusive of normalization rules. In fact, it isn't even a rule, since there's no known procedure or algorithm to guarantee it. Think of it as a Zen exercise... a key is the unique identifier for a record, and a domain (in this case) is the set of allowable values for an attribute. DKNF tells you that every constraint on the domain is a logical result of the key. Let's take a 'Fermats Theorum' approach here--anyone who wants to send me a way to ensure DKNF will have their answers quickly and insultingly responded to.

Comments and suggestions for future articles are gratefully accepted. I will also post corrections if someone points out a mistake.