When designing a database to support applications you need to consider how you are going to handle primary keys. This article explores natural and surrogate keys, and discusses the pros and cons of each, allowing you to determine what makes the best sense in your environment when you are designing your databases.
When designing a database to support applications you need
to consider how you are going to handle primary keys. There are two schools
of thought, or maybe three. There are those that say primary keys should
always be a made up key, or what is commonly called a surrogate key. Others
say there are good reasons to use real data as a key value; this type of key is
known as natural key. The third group is those that design their databases so
their primary keys are a combination of natural and surrogate keys. In this
article, I’m going explore natural and surrogate key, and discuss the pros and
cons of each. This will allow you to determine what makes best sense in your
environment when you are designing your databases.
Natural Key verses Surrogate Key
When you design tables with SQL Server, a table typically has
a column or a number of columns that are known as the primary key. The primary
key is a unique value that identifies each record. Sometimes the primary key is
made up of real data and these are normally referred to as natural keys, while
other times the key is generated when a new record is inserted into a table.
When a primary key is generated at runtime, it is called a surrogate key. A surrogate
key is typically a numeric value. Within SQL Server, Microsoft allows you to
define a column with an identity property to help generate surrogate key values.
Before I talk about the pros and cons of natural and
surrogate keys, let me first expand a little more on each type of key. By doing
this you will have a better understanding of each of these two types of keys,
and will have a more solid foundation to determine which type of key you should
use in your database design.
A natural key is a single column or set of columns that
uniquely identifies a single record in a table, where the key columns are made
up of real data. When I say “real data” I mean data that has meaning and
occurs naturally in the world of data. A natural key is a column value that
has a relationship with the rest of the column values in a given data record.
Here are some examples of natural keys values: Social Security Number, ISBN, and
A surrogate key like a natural key is a column that uniquely
identifies a single record in a table. But this is where the similarity
stops. Surrogate keys are similar to surrogate mothers. They are keys that
don’t have a natural relationship with the rest of the columns in a table. The
surrogate key is just a value that is generated and then stored with the rest
of the columns in a record. The key value is typically generated at run time
right before the record is inserted into a table. It is sometimes also
referred to as a dumb key, because there is no meaning associated with the
value. Surrogate keys are commonly a numeric number.
Now that you have an understanding of the difference between
these two types of keys I will explore why you might use one key over the
other. In the world of data architects, there is much debate over when it is
appropriate to use a natural key and when a better solution would be to use a
surrogate key. As already stated there are mainly just two
different camps. Some say you should always use a natural key and the others
say a surrogate key is best. I suppose there is also a third camp that uses a
combination of both natural keys and surrogate keys in their database design.
Rather than state my opinion on which is best I’ll give you the pros and cons
of uses each and then you can decide with is best for your design.
Surrogate Key Pros and Cons
A definite design and programming aspect of working with databases is built on the concept that all keys will be supported by the use surrogate keys. To understand these programming aspects better, review these pros and cons of using surrogate keys.
- The primary key has no business intelligence built into it.
Meaning you cannot derive any meaning, or relationship between the surrogate
key and the rest of the data columns in a row.
- If your business rules change, which would require you to update
your natural key this can be done easily without causing a cascade effect
across all foreign key relationships. By using a surrogate key instead of a
natural key the surrogate key is used in all foreign key relationships. Surrogate
keys will not be updated over time.
- Surrogate keys are typically integers, which only require 4 bytes
to store, so the primary key index structure will be smaller in size than their
natural key counter parts. Having a small index structure means better
performance for JOIN operations.
- If foreign key tables use surrogate keys then you will be
required to have a join to retrieve the real foreign key value. Whereas if the
foreign key table used a natural key then the natural key would be already be
included in your table and no join would be required. Of course this I only
true if you only needed the natural key column returned in your query
- Surrogate keys are typically not useful when searching for data
since they have no meaning.
Natural Key Pros and Cons
Having natural keys as indexes on your tables mean you will
have different programming considerations when building your applications. You
will find that pros and cons for natural keys to be just the opposite as the
pros and cons for surrogate keys.
- Will require less joins when you only need to return the key
value of a foreign key table. This is because the natural key will already be
imbedded in your table.
- Easier to search because natural keys have meaning and will be
stored in your table. Without the natural key in your table, a search for records
based on a natural key would require a join to the foreign key table to get the
- Requires much more work to change a natural key, especially when
foreign relationship have been built off the natural key.
- Your primary key index will be larger because natural keys are
typically larger in size then surrogate keys.
- Since natural keys are typically larger in size then surrogate
keys and are strings instead of integers joins between two tables on a natural
key will take more time.
What Kind of Database Designer Are You?
There is much debate in the world of data modeling over what
kind of data should be used to support primary keys. There are some purist
that say all primary key should be surrogate keys, no matter how small the
natural key, or the fact that the natural key will never be updated. Other say
you need to use natural keys because they make coding your application just so
much easier. When you design your databases, you need to decide what works best
in your environment. What kind of database designer are you and into which design camp do you fall?