Data Modeling, Breaking & Fixing First Normal Form (1NF)
April 9, 2004
It has been my observation that a lot of us lack the basic skills required to model the simplest of database objects. This article will explore some of the common mistakes and fixes to the breaking of 1NF.
First normal form has two very distinct rules:
In my years of data design there is one over-glaring mistake that a lot of "newbie" data modelers and database administrators make that cause enormous amounts of problems for themselves and others in the development chain. It is their inability to conform to the very basics of data modeling principles. Of those principles, adhering to 1NF is probably the most broken rule that I have seen. When the model does not follow 1NF, queries are hard to develop and producing usable data from the model is even harder. Typically, the developer will have to put some unusual logic, functions or application code around or within queries just to make the data manageable.
This article will walk you through these common mistakes and how you might begin to start working with those mistakes, if you find yourself in this unfortunate predicament.
Mistake 1, creating another column for a relationship
The first form of breaking 1NF is the creation of another column in a table, which really just duplicates a relationship to a primary key. For this example, let's assume that a distributor wants to keep a relationship for all of the cities in each of the states that they have a presence. Our junior modeler quickly creates a CITY table with two columns, one to hold the state value and the other to hold the city.
This table is modeled quite well but because of the modelers' inexperience and because he has just learned that there are four other cities in Colorado that are relevant, he quickly adds four more columns to his CITY table. The table now looks like the following:
Now you may say to yourself that this isn't so bad because we will only ever have 5 cities in the state of Colorado and therefore putting all of these for quick access in one table is quite efficient. While you are correct to some extent the real problem comes into play when you wish to extract the information.
One of the first questions you will get is to list all the cities in each of the states where the company has a distributor. You could easily issue the following SQL in Listing 1. This works just fine but when you try and put the result set into an application or a developer requests that the information be put into a single column output so they do not have to traverse the list of cities, the real problems begin. This request equates to producing output where one city is represented on one line with the state to which it belongs. To do this there, the SQL gets a bit interesting and quickly cumbersome to work with. Listing 2 gives just such an example where we must union together a select statement for each iteration of the cities contained in the row. The reason for the NOT NULL is because we do not want to return an empty row where there is not a city for the individual iteration. In addition, for small tables this is not quite a big deal, but this query will produce multiple table scans and if there are more than a few rows, you will have a performance problem.
Just as you start to pat yourself on the back, the next question you will be asked is if you can provide a query to determine if the company has a distributor in a particular city. Listing 3 gives you a solution to this question. Remember we must build on the prior example since typically, this will be coming from a developer and they would like the output in a single column output. This query can quickly be tailored to answer the question of in which cities in a particular state do we have distributors, or how many cities in a particular state do we have a distributor. This last question is answered in Listing 4.