Database Normalization


By now some of you are familiar with the basics of using
databases in your cgi scripts. Many of your
databases will be small, with one or two tables. But
as you become braver, tackling bigger projects, you may start
finding that the design of your tables is proving problematic.
The SQL you write starts to become unwieldy, and data anomalies
start to creep in. It is time to learn about database
normalization, or the optimization of tables.


Let’s begin by creating a sample set of data. Imagine we are
working on a system to keep track of employees working on certain
projects.













































Project number Project name Employee number Employee name Rate category Hourly rate
1023 Madagascar travel site 11 Vincent Radebe A $60
    12 Pauline James B $50
    16 Charles Ramoraz C $40
1056 Online estate agency 11 Vincent Radebe A $60
    17 Monique Williams B $50


A problem with the above data should immediately be obvious.
Tables in relational databases, which would include most databases
you’ll work with, are in a simple grid, or table format. Here,
each project has a set of employees. So we couldn’t even enter
the data into this kind of table. And if we tried to use null
fields to cater for the fields that have no value, then we cannot
use the project number, or any other field, as a primary key (a
primary key is a field, or list of fields, that uniquely identify
one record). There is not much use in having a table if we can’t
uniquely identify each record in it.



So, our solution is to make sure that each field has no sets, or
repeating groups.
Now we can place the data in a table.

employee_project table













































Project number Project name Employee number Employee name Rate category Hourly rate
1023 Madagascar travel site 11 Vincent Radebe A $60
1023 Madagascar travel site 12 Pauline James B $50
1023 Madagascar travel site 16 Charles Ramoraz C $40
1056 Online estate agency 11 Vincent Radebe A $60
1056 Online estate agency 17 Monique Williams B $50


Notice that the project number cannot be a primary key on it’s own.
It does not uniquely identify a row of data. So, our primary key
must be a combination of project number and employee number.
Together these two fields uniquely identify one row of data.
(Think about it. You would never add the same employee more than
once to a project. If for some reason this could occur, you’d
need to add something else to the key to make it unique).


Database Normalization: Part 2


 »


See All Articles by Columnist
Ian Gilfillan

Ian Gilfillan
Ian Gilfillan
Ian Gilfillan lives in Cape Town, South Africa. He is the author of the book 'Mastering MySQL 4', published by Sybex, and has been working with MySQL since 1997. These days he develops mainly in PHP and MySQL, although confesses to starting out with BASIC and COBOL, way back when, and still has a soft spot for Perl. He developed South Africa's first online grocery store, and has developed and taught internet development and other technical courses for various institutions. He has majors in Programming and Information Systems, as well as English and Philosophy. For 5 years he was Lead Developer and IT Manager for Independent Online, South Africa's premier news portal. However, he has now 'retired' from fulltime work, and is hoping that his next book will be more in the style of William Blake and Allen Ginsberg.

Latest Articles