Choosing Primary Key Codes
All Northwind and Oakmont tables use codes for
primary key values, as do almost all production databases. The critical
requirement is that the primary key value is unique to each record in the
table. Following are some tips, many with online resources, to aid in
establishing primary key codes:
- Many types of tablessuch as those for storing
information on sales orders, invoices, purchase orders, and checksare based on
documents that have consecutive serial numbers, which are obvious choices for
unique primary key values. In fact, most database designs begin with collecting
and analyzing the paper forms used by an organization. If the table itself or
programming code generates the consecutive number, make sure that every serial
number is present in the table, even if an order is canceled or voided.
Auditors are very suspicious
of invoice and purchase order registers that skip serial numbers.
- Packaged retail products sold in the United States
have a globally unique 10-digit or longer Uniform Product Code (UPC). The UPC
identifies both the supplier and the products SKU. The Uniform Code Council,
Inc. (http://www.uc-council.org/) assigns supplier and product ID values, which
are combined into linear bar codes for automated identification and data
capture (AIDC). The European Article Number (EAN) is coordinated with the UPC
to prevent duplication. The UPC/EAN code is a much better choice than
Microsofts serially assigned number for the ProductID field.
- Books have 10-digit and 13-digit International
Standard Book Number (ISBN) codes that are unique throughout the world and, in
North America, a UPC. ISBNs include a publisher prefix and book number,
assigned to U.S. publishers by the U.S. ISBN Agency
(http://www.bowker.com/standards/home/isbn/us/isbnus.html). ISBN Group Agencies
assign codes for other countries. Canada has separate agencies for English- and
French-language books. Either a UPC or ISBN field is suitable for the primary
key of a North American books database, but ISBN is preferred if the code is
for books only.
- The North American Industry Classification System
(NAICS, pronounced nakes) is
replacing the U.S. Standard Industrial Classification (SIC) for categorizing
organizations by their type of business. A six-digit primary key code for
18,000 classifications replaces the four-digit SIC code. Five of the six digits
represent codes for classifications common to the United States, Canada, and
Mexico. You can view a text file or purchase a CD-ROM of the NAICS codes and
their SIC counterparts at http://www.naics.com/.
- The U.S. Postal Service offers Address Information
Systems (AIS) files for verifying addresses and corresponding ZIP/ZIP+4 codes.
For more information on these files, go to http://www.usps.com and click the
Address Quality link.
- Social Security Numbers (SSNs) for U.S. residents are
a possible choice for a primary key of an Employees table, but their disclosure
compromises employees privacy. Large numbers of counterfeit Social Security
cards having identical numbers circulate in the United States, making SSN even less
attractive as a primary key field. The Oakmont database uses fictitious
nine-digit SSNs for EmployeeID and StudentID fields. Most organizations assign
each employee a sequential serial number. Sequential EmployeeID numbers can do
double duty as seniority-level indicators.
Specifying a primary key for
tables such as CustPers isnt easy. If you use the five-character code based on
first and last names for the primary key, you encounter the problem with
potential duplication of CustomerID codes discussed earlier. In this case,
however, common last namesJones, Smith, and Anderson, for examplequickly
result in duplicate values. Creating a composite primary key from CustomerID
and ContactID is a potential solution; doing this increases the number of new
contacts you can add for a company before inevitable duplicates occur. In most
cases, its easier to use an AutoNumber key for all ID values.
Figure 4.9 shows the final design of the modified
Northwind database with the added contact details tables. The tables of this
database are included on the accompanying CD-ROM as Nwind04.mdb in the
The final design of the expanded
Northwind database with customer and supplier contact details tables added.
The modified Northwind database
doesnt qualify as a full-fledged customer relationship management (CRM)
system, but the design is sufficiently flexible to serve as the model for a
sales and purchasing database for a small-sized wholesale or retail concern.
Maintaining Data Integrity and Accuracy
When you add, modify, or delete
table data, its important that the additions and changes you make to the data
dont conflict with the normalization rules that you used to create the
database. One of the most vexing problems facing users of large RDBMs is
unclean data. Over time, data entry errors and stray records accumulate to
the point where obtaining accurate historical information from the database
becomes difficult or impossible. Software vendors and database consultants have
created a major-scale data cleansing business to solve the problem. You can
avoid the time and expense of retroactive corrections to your data by taking
advantage of Access and SQL Server features that aid in preventing errors
during the data entry process.
You also must avoid changing the
primary keys of or deleting one of two tables in a one-to-one relationship.
integrity requires strict adherence to a single rule: Each foreign key value in a
related table must correspond with a primary key value in a base (primary)
table. This rule requires that the
following types of modifications to data be prevented:
- Adding a record on the many side of a one-to-many
relationship without the existence of a related record on the one side of the
relationship (for example, adding a record to the Orders table with a
CustomerID value of BOGUS when no such customer record exists in the Customers
- Deleting a record on the one side of a one-to-many
relationship without first deleting all corresponding records on the many side
of the relationship (for example, deleting Around the Horns Customers record
when the Orders table contains records with AROUT as the CustomerID value)
- Changing the value of a primary key field of a base
table on which records in a related base or linking table depend, such as
changing AROUT to ABOUT in the CustomerID field of the Customers table
Keypunch operators kept
their eyes on the source documents, which gave rise to the term heads-down data entry. The term continues in common use to
describe any data entry process in which the operator attention is fully
devoted to adding or editing database records as quickly as possible.
- Changing the value of a foreign key field in a linking
table to a value that doesnt exist in the primary key field of a base table
(for example, changing AROUT to ABOUT in the CustomerID field for OrderID
A record in a related table that
doesnt have a corresponding foreign key value in the primary key of a base
table is called an orphan record.
For example, if the CustomerID value of a record in the Orders table is ABCDE
and no ABCDE value exists in the CustomerID primary key field of the Customers
table, theres no way to determine which customer placed the order.
Access and SQL Server databases
offer the option of automatically enforcing referential integrity when adding
or updating data. Cascading updates and deletions are optional. If you specify
cascading updates, changing the value of a primary key of a table makes the
identical change to the foreign key value in related tables. Cascading
deletions delete all related records with a foreign key that corresponds to the
primary key of a record in a base table that you want to delete.
To learn more about enforcing
referential integrity in Access databases, see Establishing Relationships
Between Tables, p. XXX (Chapter 5) and Cascading Updates and Deletions, p.
XXX (Chapter 5).
Entity Integrity and Indexes
When you add new records to a
base table, entity integrity assures that each primary key value is unique.
Access and SQL Server ensure entity integrity by adding a no-duplicates index
to the field you specify for the primary key. If duplicate values exist when
you attempt to designate a field as the primary key, you receive an error
message. You receive a similar error message if you enter a duplicate primary
key value in the table.
For more information on Access
indexes, see Adding Indexes to Tables, p. XXX (Chapter 5).
Indexes also speed searches of
tables and improve performance when executing SQL statements that return data
from fields of base and related tables.
Data Validation Rules and Check Constraints
Data entry errors are another
major source of unclean data. In the days of punched-card data entry,
keypunch operators typed the data, and verifiers, who usually worked during the
succeeding shift, inserted the cards in a punched-card reader and repeated the
keystrokes from the same source document. This process detected typographical
errors, which the verifier corrected. Keypunch operators had no visual feedback
during data entry, so typos were inevitable; video display terminals didnt arrive
until the mainframe era.
Rekeying data leads to low
productivity, so most data entry applications support data validation rules
designed to detect attempts to enter illegal or unreasonable values in fields.
An example of a validation rule is preventing entry of a shipping date thats
earlier than the order date. The rule is expressed as an inequality: ShipDate
>= OrderDate, which returns False if the rule is violated. Similarly,
UnitPrice > 0 prevents accidentally giving away a line item of an order.
Access tables and fields have a
Validation Rule property that you set to the inequality expression. SQL Server
calls validation rules check constraints.
Both Access and SQL Server have a Validation Text property for which you
specify the text to appear in an error message box when the entry violates the
rule or constraint. Its a more common practice when working with client/server
databases to validate data in the front-end application before sending the
entry to the back-end server. Detecting the error on the server and returning
an error message requires a roundtrip from the client to the server. Server roundtrips generate quite a
bit of network traffic and reduce data entry efficiency. One of the objectives
of client/server front-end design is to minimize server round-tripping.
To learn more about Accesss
validation methods, see Validating Data Entry, p. XXX
A database transaction occurs
when multiple records in one or more tables must be added, deleted, or modified
to complete a data entry operation. Adding an order or invoice that has
multiple line items is an example of a transaction. If an order or invoice has
five line items, but a network or database problem prevents adding one or more
item records, the entire order or invoice is invalid. Maintaining referential
integrity prevents adding line item records without a corresponding order or
invoice record, but missing item records dont violate integrity rules.
As mentioned earlier in the
chapter, fields become columns
and records become rows in a
query. This terminology is an arbitrary convention of this book and not related
to relational database design theory. The reason for the change in terminology
is that a querys rows and columns need notand often do notrepresent data
values stored in the underlying tables. Queries can have columns whose values
are calculated from multiple fields and rows with aggregated data, such as
subtotals and totals.
Transaction processing (TP), also called online transaction
processing (OLTP), solves the missing
line item problem. Requiring TP for order entry, invoice processing, and similar
multirecord operations enforces an all-or-nothing rule. If every individual
update to the tables records occurs, the transaction succeeds (commits); if any update fails, changes made before the
failure occurs are reversed (rolled back). Transaction processing isnt limited to RDBMSs.
Early mainframe databases offered TP and transaction monitors. IBMs Customer
Information and Control System (CICS, pronounced kicks) was one of the first transaction processing and
monitoring systems, and it remains in widespread use today.
Access and SQL Server databases
offer built-in TP features. Access has a Use Transactions property that you set
to Yes to require TP for updates. SQL Server traditionally requires writing
T-SQL statementsBEGIN TRANS, COMMIT TRANS, and ROLLBACK TRANSto manage
transactions, but Access 2010s ADP forms have a new Batch Updates property
that lets you enforce transactions without writing complex T-SQL statements.
For a brief description of the batch
update feature introduced by Access 2007, see Changes to ADP Features, in
Online Appendix B.
Displaying Data with Queries and Views
So far, this chapter has
concentrated on designing relational databases and their tables, and adding or
altering data. SQL SELECT queries return data to Access, but you dont need to
write SQL statements to display data in forms or print reports from the data.
Access has built-in graphical tools to automatically write Access SQL for
Access databases and T-SQL for SQL Server databases. Accesss query tools use a
modern implementation of query-by-example (QBE),
an IBM trademark. QBE is a simple method of specifying the tables and columns
to view, how the data is sorted, and rows to include or exclude.
Linking related tables by their
primary and foreign keys is called joining the tables. Early QBE programs required defining
joins between tables; specifying table relationships automatically defines
joins when you add records from two or more related Access or SQL Server
Figure 4.10 is an example
of Accesss QBE implementation for Access databases, called Query Design View.
You add tables to the queryin this case, Northwinds Customers, Orders, and
Employees tables. As you add the tables, join lines indicate the relationships
between them. You drag the field names for the query columns from the table
lists in the upper pane to the Field row of the lower pane. You also can
specify the name of a calculated column (Salesperson) and the expression to
create the column values ([FirstName] & & [LastName]) in the Field
row. The brackets surrounding FirstName and LastName designate that the values
are field names.
Accesss Query Design view for
Access databases uses graphical QBE to create queries you can store in the
Selecting Ascending or Descending
in the Sort column orders the rows in left-to-right column priority. You can
restrict the display to a particular set of values by adding an expression in
the Criteria column.
Running the query returns the
resultset, part of which is shown by Figure 4.11. You can save the query for
later reuse as a named Access QueryDef(inition) object in the database.
These are the first 16 of the
408 rows of the query resultset returned by executing the query design of
SELECT Customers.CompanyName, Orders.OrderID, Orders.OrderDate,
[FirstName] & & [LastName] AS Salesperson
INNER JOIN (Customers
INNER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID)
ON Employees.EmployeeID = Orders.EmployeeID
ORDER BY Customers.CompanyName, Orders.OrderID;
Its obvious that using QBE is
much simpler than writing SELECT queries to concatenate field values, join
tables, establish row selection criteria, and specify sort order. Accesss QBE
features are powerful; many developers use Access to generate the SQL
statements needed by Visual Basic, C++, and Java programs.
Access QBE automatically converts
the query design of Figure 4.10 into the following Access SQL statement:
The da Vinci QBE tool for
creating T-SQL views is similar to the Access Query Design view, but has an
additional pane to display the T-SQL statement as you generate it. You add
tables to the upper pane and drag field names to the Column cells of the middle
pane. An SQL Server view is the client/server equivalent of an Access QueryDef.
As with Access QueryDefs, you can execute a query on an SQL Server view.
T-SQL uses + rather than & to
concatenate strings, uses a single quote () as the
string delimiter, and requires a numerical instead of a string criterion for
the YEAR function. Heres the T-SQL version of the preceding
Access SQL statement after the SELECT
and WHERE clauses have been tweaked:
modifier is needed to permit an ORDER BY
clause in a view; prior to the addition of the TOP keyword
in SQL Server 7.0, creating sorted views wasnt possible. The da Vinci query
parser adds the TOP
100 PERCENT modifier if an ORDER BY clause is present. However, TOP 100 PERCENT ... ORDER BY doesnt sort SQL Server 2005 views. Replacing 100 PERCENT with a large integer (<= 2147483647) sorts the view.
prefix to table and field names is an abbreviation for database
owner, the default owner for all SQL Server databases you create as a
system administrator. Figure 4.12 shows the design of the T-SQL query generated
by pasting the preceding statement into the da Vinci query pane.
Despite their common ANSI SQL-92
heritage, SQL Server wont execute most Access SQL statements, and vice versa.
Copying the preceding Access SQL statement to the Clipboard and pasting it into
the SQL pane of the query designer for the NorthwindCS sample database doesnt
work. The da Vinci designer does its best to translate the Access SQL flavor
into T-SQL when you paste, but you receive errors when you try to run the
SELECT TOP (2147483647) dbo.Customers.CompanyName,
dbo.Employees.FirstName + +
dbo.Employees.LastName AS Salesperson
INNER JOIN dbo.Customers
INNER JOIN dbo.Orders
ON dbo.Customers.CustomerID = dbo.Orders.CustomerID
ON dbo.Employees.EmployeeID = dbo.Orders.EmployeeID
WHERE (YEAR(dbo.Orders.OrderDate) = 2006)
ORDER BY dbo.Customers.CompanyName, dbo.Orders.OrderID
For more information on the da Vinci
toolset, see Exploring SQL Server Views, in online Chapter 27.
- For detailed instructions on
installing SQL Server Express and NorthwindCS.adp, see Performing SQL Server
Express Setup, p. XXX (Chapter 1), and Exploring the NorthwindCS Sample
Project, in online Chapter 27.
The Datasheet view of the SQL
Server view generated by the preceding SQL statement is identical to the Access
querys Datasheet view shown in Figure 4.11.
Pasting an Access SQL statement into Accesss version of the
da Vinci query design tool and making a few minor changes to the T-SQL
statement results in an SQL Server view equivalent to the Access query of
Microsoft Access 2010 In Depth, Rough Cuts
By Roger Jennings