Generating Surrogate Keys Using an Identity Column in SQL Server
February 28, 2011
In my last article I talked about the difference between surrogate keys and natural keys. In that article I discussed how surrogate keys are made up keys, meaning they do not appear naturally in the data. In this article I will be showing you how to generate those surrogate keys using an identity column. I will be exploring what is an identity column, how to define an identity column and the different methods of populating an identity column.
What is an identity column?An identity column is a single column in a table that has its identity column property set. A table doesn't need to have an identity column. When a table has an identity column, that column is automatically populated with an integer value every time a new row is added to the table; more on this is a minute. The value of an identify column is based on a seed and increment value that is associated with the identify column; more detail on this further down in this article.
An identity column property can only be set on columns that are declared as a decimal, int, numeric, smallint, bigint, or tinyint. If the identity property is associated with a numeric or decimal, the scale must be set to 0. When you set the identity property, there are two components of that property: seed and increment. Additionally, the column must be defined to not allow NULL values to be inserting into it. You can set up an identity column when you declare a table, or you can set up an identify column on an existing table column by altering the column properties.
Creating a table with an identity columnWhen you create a table you can define the identity column. You can also add an identity column to a pre-existing table; more on that later. To define an identity column when you create a table you just need to set the IDENTITY property on the CREATE TABLE statement. Here is an example:
Above I created a column called "ID" that is my IDENTITY column. Note I specified "IDENTITY(1,1)." The "1,1" notation specifies the "seed" and "increment" value for the identity column. The "seed" value is used to set the value of the ID column for the first row inserted into the table. The "increment" value is used to populate the identity column value for additional rows added to the table, by adding this value to identity column value of the previously inserted row. The "seed" and "increment" values need to be an integer, both positive and negative values are allowed. In my example above I said I wanted my first row inserted to have an identity column value of 1. The second inserted row would have an identity column value of 2, and so on and so forth.
You can also create an identity column when creating a table using a SELECT statement with an INTO table clause. To do this you use the IDENTITY function. The IDENTITY function has the following syntax:
Where data_type is one of the valid identity column data types listed above, seed is the identity column value for the first row added, increment is an integer value that is added to the identity column value of the prior inserted row and column_name is the name of the IDENTITY column to be created.
Here is an example of how to create a new table that has an identity column using a SELECT ... INTO method:
Here I am using the SELECT...INTO syntax to create the table MyTableNew. To define my identity column I used the IDENTITY function to define an integer column where the identity properties have a seed value of 1 and an increment value of 1.
Altering an existing table to have an identity columnOccasionally you might find you need to add the identity property to an existing column in an existing table, or adding a new identity column to an existing table. Let me explore how to do this, and the issues you might run into.
First let's talk about altering a table to add an identity column to an existing table. By adding an identity column, I mean adding a brand new column to a table. To do that you need to alter the table definitions. Let's assume I have the following table definition:
For this example, assume that this table already has 39 different rows in this table, where the County Code contains abbreviation of the County name to uniquely identify each row, the ReferenceID is basically row number that is manually populated, and the CountyName the spelled out the name of the county. Say I decided I wanted to put a surrogate key column on this table that is an INT and populate it using the IDENTITY property. To do that I would just need to run the following ALTER TABLE statement:
Upon executing this ALTER statement, SQL Server will first alter the table adding the CountyID column. Then once the column is added SQL Server will number all the existing rows automatically based on the identity property.
Assume I want to set the identity property of my existing ReferenceID which has already been populated manually with a row number. There is no simple one statement method to accomplish this. Instead I have to jump through a number of hoops to do this.
Assume my original table above looked like this:
Where I have 39 existing records populated in this table, where each row has a unique reference number that has been set manually. Assume for now there are no constraints on this table. In order to make the ReferenceID my identity column, I would first need to rename the table to say to something like dbo.CountyOld. Then I could create my new County table using the following code, which sets the ReferenceID as an identity:
After this I would set the IDENTIFY_INSERT ON (more on this in the next section) for this table. Then run the following code:
After the INSERT statement was done running, I would turn the IDENTITY_INSERT OFF for this table, and then drop the dbo.CountyOld table. If I had constraints on my table I would have to take the necessary actions to drop and recreate those constraints.
Alternatively, I can use the "Design" feature of a table in SQL Server Management Studio to set the identity properties on an existing table. Using SQL Server Management Studio, perform similar steps as I described above.
Inserting records into a table that has an identity columnWhen you have a table with an identity column there are things you need to think about when inserting records into these tables. Let me go through a couple of INSERT statements to describe how inserting records is done.
First, let me talk about how to insert records where the identity column is populated automatically using the identity properties. Remember the table dbo.MyTable that I created above, it had three columns ID, MyShortDesc, and MyLongDesc where the identity property was set on the ID column. This is the table I will be using for my example, and here is an INSERT statement that adds a new row to this table:
In this example I specified the column names I was populating with values in the dbo.MyTable by placing those columns inside parenthesis immediately following dbo.MyTable. Note how I didn't specify the identity column ID. I didn't have to include this column in my INSERT statement because it will automatically be populated using the identity property setting associated with this column. Another way to write this insert statement is like this:
Here I left off the column names following the table name dbo.MyTable. I was able to do that because SQL Server knows the only other column on this table is the identity column, and it knows how to populate the value for that column.
What if I wanted to set the identify column value myself on the INSERT statement. How is this done? As it turns out this isn't as simple as one might think. I CANNOT just execute this code:
If I try to run an INSERT statement similar to this where I try to identify a value of the identity column I would get this error:
This error message tell me I need to set the IDENTITY_INSERT value to ON if I wanted to explicitly set the identity value. Let's try this again and set the IDENTITY_INSERT value to ON by using this code:
By using the SET statement to set the IDENTITY_INSERT option to ON, it allows me to set the identify column ID to a value "12." Keep in mind that you can only have the IDENTIFY_INSERT value turned on for only one table at a time in a session. Also, when you have IDENTITY_INSERT on you are able to insert multiple rows with the same identity column value, provided you don't have a constraint that restricts duplicate values in your identity column. You can also insert rows that have an identity column value greater than the last identity column value created. This will leave holes in your identify column values and will also set the value SQL Server is keeping that helps it determine the next identify value. Once you are done inserting rows, where you are setting the identity column value, you should turn off the IDENTITY_INSERT option by running the following command:
Deleting records from a table that has identify columnYou might be wondering what happens with identity column values when you delete a record in a table that has an identity column. When rows are deleted, the identity values are not reused. Therefore, over time you will have gaps in your identity column values based on the records that have been deleted. If this is a problem for your situation, you might consider using a trigger to populate a sequential number column instead of using an identity column.
Value of using an identity columnIdentity columns make it easy to have surrogate key columns that are automatically populated. Having a column be populated by the identity property also makes it easy to create unique identity column values for each row. Next time you want a surrogate key when you design a table, consider creating the key as an identity column.