Deciding On A Primary Key In Oracle

Data integrity is one of the keys to business success and a sure way to provide that is to declare primary keys for critical data. That, sometimes, is easier said than done since a clear choice for such a key may not be available. Primary keys can be natural, meaning they use the existing data to ensure uniqueness, surrogate, such as a unique sequence, or a combination of both. Which to choose? Obviously, if you have it a natural key is best but choosing that path isn’t always simple. So how DO you choose a primary key for your data? Let’s look at several examples and see how this plays out.

Dee Dee and Dixie Kupps own a U.S. consulting firm that has recently expanded with international offices in Germany and France. Having employees outside of the U.S. can make it difficult to uniquely identify each employee. When the company was only the U.S. office they used the Social Security Number as a unique identifier, but Germany and France employees don’t have Social Security Numbers. Germany and France issue Tax ID numbers, unique 11-digit identification numbers used in a similar way to U.S. Social Security Numbers. Unfortunately these are not in the same format as a U.S. Social Security Number (SSN). And, SSNs aren’t ‘numbers’ to a database because of the two embedded ‘-‘ characters. The easiest solution is to reformat the SSN without using ‘-‘, choosing instead to have the application display the SSN correctly. Since Tax IDs are 11 digits and SSNs are 9 there shouldn’t be an issue of duplication so let’s see how that might work:


SQL> create table employ(
  2          emp_id          number(12),
  3          location        varchar(3),
  4          emp_l_name      varchar2(25),
  5          emp_f_name      varchar2(25),
  6          hire_date       date,
  7          constraint employ_pk primary key(emp_id));

Table created.

SQL>
SQL> insert all
  2  into employ
  3  values(444556666, 'US', 'Bleeble', 'Fern', sysdate - 1000)
  4  into employ
  5  values(987123456, 'US', 'Blooble', 'Fern', sysdate - 970)
  6  into employ
  7  values(23444556666, 'DEU', 'Bleeble', 'Fern', sysdate - 900)
  8  into employ
  9  values(11446778999, 'FR', 'Bleeble', 'Fern', sysdate - 800)
 10  select * From dual;

4 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> select emp_id, location, emp_f_name, emp_l_name, hire_date
  2  from employ;

         EMP_ID LOC EMP_F_NAME                EMP_L_NAME                HIRE_DATE
--------------- --- ------------------------- ------------------------- ---------
      444556666 US  Fern                      Bleeble                   10-NOV-12
      987123456 US  Fern                      Blooble                   10-DEC-12
    23444556666 DEU Fern                      Bleeble                   18-FEB-13
    11446778999 FR  Fern                      Bleeble                   29-MAY-13

SQL>
SQL> select case location when 'US' then substr(emp_id, 1,3)||'-'||substr(emp_id,4,2)||'-'||substr(emp_id, 6) else to_char(emp_id) end emp_id,
  2         location, emp_f_name, emp_l_name, hire_date
  3  from employ;

EMP_ID                                     LOC EMP_F_NAME                EMP_L_NAME                HIRE_DATE
------------------------------------------ --- ------------------------- ------------------------- ---------
444-55-6666                                US  Fern                      Bleeble                   10-NOV-12
987-12-3456                                US  Fern                      Blooble                   10-DEC-12
23444556666                                DEU Fern                      Bleeble                   18-FEB-13
11446778999                                FR  Fern                      Bleeble                   29-MAY-13

SQL>

Since Tax IDs are unique throughout the EU this makes a suitable primary key. Notice that the SSNs are properly formatted in the second query while the EU Tax IDs remain as unchanged. This appears to solve the unique employee identifier issue. There are additional issues with employees, such as unique logins to submit time worked but since this is not an exhaustive article that won’t be covered here.

Looking at other another company, Amalgamated Astronaut Supply, inventory control is a problem. Several warehouses across the country stock the same items in order to reduce shipping costs. How to uniquely identify a record in this case? Fortunately the company not only uses warehouses, but also bins to store items. There are 4 warehouses, one in Sheepshank, Wyoming, another in Pickle Bluff, Arkansas, one in Mudbug, Louisiana and the last one in Happy Canyon, Colorado. Each warehouse has a number, and each warehouse has 1000 numbered bins to hold parts. You can probably guess how the primary key is going to be defined; let’s look at that configuration:


SQL> create table warehouse(
  2          warehouse_id    number,
  3          warehouse_loc   varchar2(100),
  4          constraint whse_pk primary key (warehouse_id, warehouse_loc));

Table created.

SQL>
SQL> create table inventory(
  2          warehouse_id    number,
  3          bin_id          number,
  4          part_id         varchar2(40),
  5          on_hand         number,
  6          on_order        number,
  7          inv_date        date,
  8          constraint inv_pk primary key (warehouse_id, bin_id, part_id));

Table created.

SQL>
SQL> insert all
  2  into warehouse
  3  values(1, 'Sheepshank, Wyoming')
  4  into warehouse
  5  values(2, 'Pickle Bluff, Arkansas')
  6  into warehouse
  7  values(3, 'Mudbug, Louisiana')
  8  into warehouse
  9  values(4, 'Happy Canyon, Colorado')
 10  select * from dual;

4 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> insert all
  2  into inventory
  3  values(1,1,'A1000', 1000, 210, sysdate)
  4  into inventory
  5  values(2,1,'A1000', 1000, 317, sysdate)
  6  into inventory
  7  values(2,2,'A1200', 2000, 212, sysdate)
  8  select * From dual;

3 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> select * from inventory;

   WAREHOUSE_ID          BIN_ID PART_ID                                          ON_HAND        ON_ORDER INV_DATE
--------------- --------------- ---------------------------------------- --------------- --------------- ---------
              1               1 A1000                                               1000             210 07-AUG-15
              2               1 A1000                                               1000             317 07-AUG-15
              2               2 A1200                                               2000             212 07-AUG-15

SQL>
SQL> insert into inventory
  2  values(1,1,'A1000', 1000, 210, sysdate);
insert into inventory
*
ERROR at line 1:
ORA-00001: unique constraint (BING.INV_PK) violated


SQL>

It works pretty well, but a different part can end up in the wrong bin and no error is reported:


SQL>
SQL> insert into inventory
  2  values(1,1,'A1001', 1000, 210, sysdate);

1 row created.

SQL>
SQL> select * from inventory;

   WAREHOUSE_ID          BIN_ID PART_ID                                          ON_HAND        ON_ORDER INV_DATE
--------------- --------------- ---------------------------------------- --------------- --------------- ---------
              1               1 A1000                                               1000             210 07-AUG-15
              2               1 A1000                                               1000             317 07-AUG-15
              2               2 A1200                                               2000             212 07-AUG-15
              1               1 A1001                                               1000             210 07-AUG-15

SQL>

The key is doing its job since the key itself is unique; the problem stems from the part number since the key can’t enforce the fact that only part A1000 resides in bin 1. That can be ‘fixed’ but it takes a trigger:


SQL> delete from inventory where part_id = 'A1001';

1 row deleted.

SQL>
SQL> create trigger inv_trg before insert on inventory
  2  for each row
  3  declare
  4          v_partid        inventory.part_id%type;
  5  begin
  6          select part_id
  7          into v_partid
  8          from inventory
  9          where warehouse_id = :new.warehouse_id
 10          and bin_id = :new.bin_id;
 11
 12          if v_partid <> :new.part_id then
 13                  raise_application_error(-20999, 'Wrong part number for bin '||:new.bin_id);
 14          end if;
 15  end;
 16  /

Trigger created.

SQL>
SQL> show errors
No errors.
SQL>
SQL> insert into inventory
  2  values(1,1,'A1001', 1000, 210, sysdate);
insert into inventory
            *
ERROR at line 1:
ORA-20999: Wrong part number for bin 1
ORA-06512: at "BING.INV_TRG", line 11
ORA-04088: error during execution of trigger 'BING.INV_TRG'


SQL>

In some cases a surrogate key may be the only option. Take, for example, Statspack and AWR data rely on snapshot ids, sequential ‘keys’ so that normally unique data can be repeated (table names, tablespace names, etc.). Since systems using this approach use the same snapshot id across multiple tables care must be taken to ensure the snapshot ids match across all affected objects.

A third option is available, a combined natural/surrogate key when the natural keys can generate repeat values. Take a user login system built upon the following strategy:

  • Logins will be generated by the first character of the first name and the first four characters of the last name

Simple enough, but that leads to multiple users possibly having the same login. If Bea Franklin and Betty Franks both work for the same company they both will generate:

bfran

The logins will not be unique. Requiring 5 letters from the last name doesn’t improve things:

bfrank

So a different approach is needed. Adding a surrogate sequence in a separate column doesn’t make it any better:

1 bfrank
2 bfrank

as the actual logins are still the same. Concatenating the natural and surrogate values can work, though, producing:

bfrank1
bfrank2

This makes both logins unique and allows each to uniquely identify its owner. Some companies use this same strategy for employee logins with the requirement that the added sequence be two digits:

bfrank01
bfrank02

Now a company can have 99 employees whose names generate the same base login and provide unique login names for all.

Designing a system with primary keys in mind is not an easy task, and the solutions may not be the simplest. The effort is well worth the time and trouble when you realize that it’s the best way to prevent duplicate data.

See all articles by David Fitzjarrell

David Fitzjarrell
David Fitzjarrell
David Fitzjarrell has more than 20 years of administration experience with various releases of the Oracle DBMS. He has installed the Oracle software on many platforms, including UNIX, Windows and Linux, and monitored and tuned performance in those environments. He is knowledgeable in the traditional tools for performance tuning – the Oracle Wait Interface, Statspack, event 10046 and 10053 traces, tkprof, explain plan and autotrace – and has used these to great advantage at the U.S. Postal Service, American Airlines/SABRE, ConocoPhilips and SiriusXM Radio, among others, to increase throughput and improve the quality of the production system. He has also set up scripts to regularly monitor available space and set thresholds to notify DBAs of impending space shortages before they affect the production environment. These scripts generate data which can also used to trend database growth over time, aiding in capacity planning. He has used RMAN, Streams, RAC and Data Guard in Oracle installations to ensure full recoverability and failover capabilities as well as high availability, and has configured a 'cascading' set of DR databases using the primary DR databases as the source, managing the archivelog transfers manually and montoring, through scripts, the health of these secondary DR databases. He has also used ASM, ASMM and ASSM to improve performance and manage storage and shared memory.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles