Choosing Primary Key Codes
All Northwind and Oakmont tables use codes for primary key values, as do almost all production databases. The critical requirement is that the primary key value is unique to each record in the table. Following are some tips, many with online resources, to aid in establishing primary key codes:
Many types of tablessuch as those for storing information on sales orders, invoices, purchase orders, and checksare based on documents that have consecutive serial numbers, which are obvious choices for unique primary key values. In fact, most database designs begin with collecting and analyzing the paper forms used by an organization. If the table itself or programming code generates the consecutive number, make sure that every serial number is present in the table, even if an order is canceled or voided. Auditors are very suspicious of invoice and purchase order registers that skip serial numbers.
Tip - AutoNumber primary key values work well for serially numbered documents if you don't allow records to be deleted. Adding a true-false (Boolean) field named Deleted and setting the value to true is one approach. This technique complicates queries against the tables, so you might consider moving deleted records to another table. Doing this lets you write a query to reconstruct all records for audit purposes.
Packaged retail products sold in the United States have a globally unique 10-digit or longer Uniform Product Code (UPC). The UPC identifies both the supplier and the product's SKU. The Uniform Code Council, Inc., (http://www.uc-council.org/) assigns supplier and product ID values, which are combined into linear bar codes for automated identification and data capture (AIDC). The European Article Number (EAN) is coordinated with the UPC to prevent duplication. The UPC/EAN code is a much better choice than Microsoft's serially assigned number for the ProductID field.
Books have a 10-digit International Standard Book Number (ISBN) code that's unique throughout the world and, in North America, a UPC. ISBNs include a publisher prefix and book number, assigned to U.S. publishers by the U.S. ISBN Agency (http://www.bowker.com/standards/home/isbn/us/isbnus.html). ISBN Group Agencies assign codes for other countries. Canada has separate agencies for English- and French-language books. Either a UPC or ISBN field is suitable for the primary key of a North American books database, but ISBN is preferred if the code is for books only.
The North American Industry Classification System (NAICS, pronounced nakes) is replacing the U.S. Standard Industrial Classification (SIC) for categorizing organizations by their type of business. A six-digit primary key code for 18,000 classifications replaces the four-digit SIC code. Five of the six digits represent codes for classifications common to the United States, Canada, and Mexico. You can view a text file or purchase a CD-ROM of the NAICS codes and their SIC counterparts at http://www.naics.com/.
The U.S. Postal Service offers Address Information Systems (AIS) files for verifying addresses and corresponding ZIP/ZIP+4 codes. For more information on these files, go to http://www.usps.com and click the Address Quality link.
Social Security Numbers (SSNs) for U.S. residents are a possible choice for a primary key of an Employees table, but their disclosure compromises employee's privacy. Large numbers of counterfeit Social Security cards having identical numbers circulate in the United States, making SSN even less attractive as a primary key field. The Oakmont database uses fictitious nine-digit SSNs for EmployeeID and StudentID fields. Most organizations assign each employee a sequential serial number. Sequential EmployeeID numbers can do double-duty as seniority-level indicators.
Specifying a primary key for tables such as CustPers isn't easy. If you use the five-character code based on first and last names for the primary key, you encounter the problem with potential duplication of CustomerID codes discussed earlier. In this case, however, common last namesJones, Smith, and Anderson, for examplequickly result in duplicate values. Creating a composite primary key from CustomerID and ContactID is a potential solution; doing this increases the number of new contacts you can add for a company before inevitable duplicates occur. In most cases, it's easier to use an AutoNumber key for all ID values.
Figure 4.9 shows the final design of the modified Northwind database with the added contact details tables. The tables of this database are included on the accompanying CD-ROM as Nwind04.mdb in the \Seua2007\Chaptr04 folder.
The final design of the expanded Northwind database with customer and supplier contact details tables added.
The modified Northwind database doesn't qualify as a full-fledged customer relationship management (CRM) system, but the design is sufficiently flexible to serve as the model for a sales and purchasing database for a small-sized wholesale or retail concern.
Maintaining Data Integrity and Accuracy
When you add, modify, or delete table data, it's important that the additions and changes you make to the data don't conflict with the normalization rules that you used to create the database. One of the most vexing problems facing users of large RDBMs is "unclean data." Over time, data entry errors and stray records accumulate to the point where obtaining accurate historical information from the database becomes difficult or impossible. Software vendors and database consultants have created a major-scale "data cleansing" business to solve the problem. You can avoid the time and expense of retroactive corrections to your data by taking advantage of Access and SQL Server features that aid in preventing errors during the data entry process.
Maintaining referential integrity requires strict adherence to a single rule: Each foreign key value in a related table must correspond with a primary key value in a base (primary) table. This rule requires that the following types of modifications to data be prevented:
Adding a record on the many side of a one-to-many relationship without the existence of a related record on the one side of the relationship (for example, adding a record to the Orders table with a CustomerID value of BOGUS when no such customer record exists in the Customers table)
Deleting a record on the one side of a one-to-many relationship without first deleting all corresponding records on the many side of the relationship (for example, deleting Around the Horn's Customers record when the Orders table contains records with AROUT as the CustomerID value)
Changing the value of a primary key field of a base table on which records in a related base or linking table depend, such as changing AROUT to ABOUT in the CustomerID field of the Customers table
Changing the value of a foreign key field in a linking table to a value that doesn't exist in the primary key field of a base table (for example, changing AROUT to ABOUT in the CustomerID field for OrderID 10355)
Note - You also must avoid changing the primary keys of or deleting one of two tables in a one-to-one relationship.
A record in a related table that doesn't have a corresponding foreign key value in the primary key of a base table is called an orphan record. For example, if the CustomerID value of a record in the Orders table is ABCDE and there's no ABCDE value in the CustomerID primary key field of the Customers table, there's no way to determine which customer placed the order.
Access and SQL Server databases offer the option of automatically enforcing referential integrity when adding or updating data. Cascading updates and deletions are optional. If you specify cascading updates, changing the value of a primary key of a table makes the identical change to the foreign key value in related tables. Cascading deletions delete all related records with a foreign key that corresponds to the primary key of a record in a base table that you want to delete.
→ To learn more about enforcing referential integrity in Access databases, see "Establishing Relationships Between Tables," and "Cascading Updates and Deletions."
Entity Integrity and Indexes
When you add new records to a base table, entity integrity assures that each primary key value is unique. Access and SQL Server ensure entity integrity by adding a no-duplicates index to the field you specify for the primary key. If duplicate values exist when you attempt to designate a field as the primary key, you receive an error message. You receive a similar error message if you enter a duplicate primary key value in the table.
→ For more information on Access indexes, see "Adding Indexes to Tables."
Indexes also speed searches of tables and improve performance when executing SQL statements that return data from fields of base and related tables.
Data Validation Rules and Check Constraints
Data entry errors are another major source of "unclean data." In the days of punched-card data entry, keypunch operators typed the data, and verifiers, who usually worked during the succeeding shift, inserted the cards in a punched-card reader and repeated the keystrokes from the same source document. This process detected typographical errors, which the verifier corrected. Keypunch operators had no visual feedback during data entry, so typos were inevitable; video display terminals didn't arrive until the mainframe era.
Note - Keypunch operators kept their eyes on the source documents, which gave rise to the term heads-down data entry. The term continues in common use to describe any data entry process in which the operator's entire working day is spent adding or editing database records as quickly as possible.
Rekeying data leads to low productivity, so most data entry applications support data validation rules designed to detect attempts to enter illegal or unreasonable values in fields. An example of a validation rule is preventing entry of a shipping date that's earlier than the order date. The rule is expressed as an inequality: ShipDate >= OrderDate, which returns False if the rule is violated. Similarly, UnitPrice > 0 prevents accidentally giving away a line item of an order.
Access tables and fields have a Validation Rule property that you set to the inequality expression. SQL Server calls validation rules check constraints. Both Access and SQL Server have a Validation Text property for which you specify the text to appear in an error message box when the entry violates the rule or constraint. It's a more common practice when working with client/server databases to validate data in the front-end application before sending the entry to the back-end server. Detecting the error on the server and returning an error message requires a roundtrip from the client to the server. Server roundtrips generate quite a bit of network traffic and reduce data entry efficiency. One of the objectives of client/server front-end design is to minimize server roundtripping.
→ To learn more about Access's validation methods, see "Validating Data Entry."
A database transaction occurs when multiple records in one or more tables must be added, deleted, or modified to complete a data entry operation. Adding an order or invoice that has multiple line items is an example of a transaction. If an order or invoice has five line items, but a network or database problem prevents adding one or more item records, the entire order or invoice is invalid. Maintaining referential integrity prevents adding line item records without a corresponding order or invoice record, but missing item records don't violate integrity rules.
Transaction processing (TP), also called online transaction processing (OLTP), solves the missing line item problem. Requiring TP for order entry, invoice processing, and similar multirecord operations enforces an all-or-nothing rule. If every individual update to the tables' records occurs, the transaction succeeds (commits); if any update fails, changes made before the failure occurs are reversed (rolled back). Transaction processing isn't limited to RDBMSs. Early mainframe databases offered TP and transaction monitors. IBM's Customer Information and Control System (CICS, pronounced kicks) was one of the first transaction processing and monitoring systems and remains in widespread use today.
Access and SQL Server databases offer built-in TP features. Access has a Use Transactions property that you set to Yes to require TP for updates. SQL Server traditionally requires writing T-SQL statementsBEGIN TRANS, COMMIT TRANS, and ROLLBACK TRANSto manage transactions, but Access 2007's ADP forms have a new Batch Updates property that lets you enforce transactions without writing complex T-SQL statements.