Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS Access

Posted Aug 25, 2010

Exploring Relational Database Theory and Practice - Page 3

By DatabaseJournal.com Staff

Choosing Primary Key Codes

All Northwind and Oakmont tables use codes for primary key values, as do almost all production databases. The critical requirement is that the primary key value is unique to each record in the table. Following are some tips, many with online resources, to aid in establishing primary key codes:

  • Many types of tables—such as those for storing information on sales orders, invoices, purchase orders, and checks—are based on documents that have consecutive serial numbers, which are obvious choices for unique primary key values. In fact, most database designs begin with collecting and analyzing the paper forms used by an organization. If the table itself or programming code generates the consecutive number, make sure that every serial number is present in the table, even if an order is canceled or voided. Auditors are very suspicious of invoice and purchase order registers that skip serial numbers.
  • Packaged retail products sold in the United States have a globally unique 10-digit or longer Uniform Product Code (UPC). The UPC identifies both the supplier and the product’s SKU. The Uniform Code Council, Inc. (http://www.uc-council.org/) assigns supplier and product ID values, which are combined into linear bar codes for automated identification and data capture (AIDC). The European Article Number (EAN) is coordinated with the UPC to prevent duplication. The UPC/EAN code is a much better choice than Microsoft’s serially assigned number for the ProductID field.
  • Books have 10-digit and 13-digit International Standard Book Number (ISBN) codes that are unique throughout the world and, in North America, a UPC. ISBNs include a publisher prefix and book number, assigned to U.S. publishers by the U.S. ISBN Agency (http://www.bowker.com/standards/home/isbn/us/isbnus.html). ISBN Group Agencies assign codes for other countries. Canada has separate agencies for English- and French-language books. Either a UPC or ISBN field is suitable for the primary key of a North American books database, but ISBN is preferred if the code is for books only.
  • The North American Industry Classification System (NAICS, pronounced nakes) is replacing the U.S. Standard Industrial Classification (SIC) for categorizing organizations by their type of business. A six-digit primary key code for 18,000 classifications replaces the four-digit SIC code. Five of the six digits represent codes for classifications common to the United States, Canada, and Mexico. You can view a text file or purchase a CD-ROM of the NAICS codes and their SIC counterparts at http://www.naics.com/.
  • The U.S. Postal Service offers Address Information Systems (AIS) files for verifying addresses and corresponding ZIP/ZIP+4 codes. For more information on these files, go to http://www.usps.com and click the Address Quality link.
  • Social Security Numbers (SSNs) for U.S. residents are a possible choice for a primary key of an Employees table, but their disclosure compromises employees’ privacy. Large numbers of counterfeit Social Security cards having identical numbers circulate in the United States, making SSN even less attractive as a primary key field. The Oakmont database uses fictitious nine-digit SSNs for EmployeeID and StudentID fields. Most organizations assign each employee a sequential serial number. Sequential EmployeeID numbers can do double duty as seniority-level indicators.

Specifying a primary key for tables such as CustPers isn’t easy. If you use the five-character code based on first and last names for the primary key, you encounter the problem with potential duplication of CustomerID codes discussed earlier. In this case, however, common last names—Jones, Smith, and Anderson, for example—quickly result in duplicate values. Creating a composite primary key from CustomerID and ContactID is a potential solution; doing this increases the number of new contacts you can add for a company before inevitable duplicates occur. In most cases, it’s easier to use an AutoNumber key for all ID values.

Figure 4.9 shows the final design of the modified Northwind database with the added contact details tables. The tables of this database are included on the accompanying CD-ROM as Nwind04.mdb in the \2010Samples\Chaptr04 folder.

the final design of the modified
Northwind database with the added contact details tables
Figure 4.9

The final design of the expanded Northwind database with customer and supplier contact details tables added.

The modified Northwind database doesn’t qualify as a full-fledged customer relationship management (CRM) system, but the design is sufficiently flexible to serve as the model for a sales and purchasing database for a small-sized wholesale or retail concern.

Maintaining Data Integrity and Accuracy

When you add, modify, or delete table data, it’s important that the additions and changes you make to the data don’t conflict with the normalization rules that you used to create the database. One of the most vexing problems facing users of large RDBMs is “unclean data.” Over time, data entry errors and stray records accumulate to the point where obtaining accurate historical information from the database becomes difficult or impossible. Software vendors and database consultants have created a major-scale “data cleansing” business to solve the problem. You can avoid the time and expense of retroactive corrections to your data by taking advantage of Access and SQL Server features that aid in preventing errors during the data entry process.

Note

You also must avoid changing the primary keys of or deleting one of two tables in a one-to-one relationship.

Referential Integrity

Maintaining referential integrity requires strict adherence to a single rule: Each foreign key value in a related table must correspond with a primary key value in a base (primary) table. This rule requires that the following types of modifications to data be prevented:

  • Adding a record on the many side of a one-to-many relationship without the existence of a related record on the one side of the relationship (for example, adding a record to the Orders table with a CustomerID value of BOGUS when no such customer record exists in the Customers table)
  • Deleting a record on the one side of a one-to-many relationship without first deleting all corresponding records on the many side of the relationship (for example, deleting Around the Horn’s Customers record when the Orders table contains records with AROUT as the CustomerID value)
  • Changing the value of a primary key field of a base table on which records in a related base or linking table depend, such as changing AROUT to ABOUT in the CustomerID field of the Customers table

Note

Keypunch operators kept their eyes on the source documents, which gave rise to the term heads-down data entry. The term continues in common use to describe any data entry process in which the operator attention is fully devoted to adding or editing database records as quickly as possible.

  • Changing the value of a foreign key field in a linking table to a value that doesn’t exist in the primary key field of a base table (for example, changing AROUT to ABOUT in the CustomerID field for OrderID 10355)

A record in a related table that doesn’t have a corresponding foreign key value in the primary key of a base table is called an orphan record. For example, if the CustomerID value of a record in the Orders table is ABCDE and no ABCDE value exists in the CustomerID primary key field of the Customers table, there’s no way to determine which customer placed the order.

Access and SQL Server databases offer the option of automatically enforcing referential integrity when adding or updating data. Cascading updates and deletions are optional. If you specify cascading updates, changing the value of a primary key of a table makes the identical change to the foreign key value in related tables. Cascading deletions delete all related records with a foreign key that corresponds to the primary key of a record in a base table that you want to delete.

  • To learn more about enforcing referential integrity in Access databases, see “Establishing Relationships Between Tables,” p. XXX (Chapter 5) and “Cascading Updates and Deletions,” p. XXX (Chapter 5).

Entity Integrity and Indexes

When you add new records to a base table, entity integrity assures that each primary key value is unique. Access and SQL Server ensure entity integrity by adding a no-duplicates index to the field you specify for the primary key. If duplicate values exist when you attempt to designate a field as the primary key, you receive an error message. You receive a similar error message if you enter a duplicate primary key value in the table.

  • For more information on Access indexes, see “Adding Indexes to Tables,” p. XXX (Chapter 5).

Indexes also speed searches of tables and improve performance when executing SQL statements that return data from fields of base and related tables.

Data Validation Rules and Check Constraints

Data entry errors are another major source of “unclean data.” In the days of punched-card data entry, keypunch operators typed the data, and verifiers, who usually worked during the succeeding shift, inserted the cards in a punched-card reader and repeated the keystrokes from the same source document. This process detected typographical errors, which the verifier corrected. Keypunch operators had no visual feedback during data entry, so typos were inevitable; video display terminals didn’t arrive until the mainframe era.

Rekeying data leads to low productivity, so most data entry applications support data validation rules designed to detect attempts to enter illegal or unreasonable values in fields. An example of a validation rule is preventing entry of a shipping date that’s earlier than the order date. The rule is expressed as an inequality: ShipDate >= OrderDate, which returns False if the rule is violated. Similarly, UnitPrice > 0 prevents accidentally giving away a line item of an order.

Access tables and fields have a Validation Rule property that you set to the inequality expression. SQL Server calls validation rules check constraints. Both Access and SQL Server have a Validation Text property for which you specify the text to appear in an error message box when the entry violates the rule or constraint. It’s a more common practice when working with client/server databases to validate data in the front-end application before sending the entry to the back-end server. Detecting the error on the server and returning an error message requires a roundtrip from the client to the server. Server roundtrips generate quite a bit of network traffic and reduce data entry efficiency. One of the objectives of client/server front-end design is to minimize server round-tripping.

  • To learn more about Access’s validation methods, see “Validating Data Entry,” p. XXX
    (Chapter 6).

Transactions

A database transaction occurs when multiple records in one or more tables must be added, deleted, or modified to complete a data entry operation. Adding an order or invoice that has multiple line items is an example of a transaction. If an order or invoice has five line items, but a network or database problem prevents adding one or more item records, the entire order or invoice is invalid. Maintaining referential integrity prevents adding line item records without a corresponding order or invoice record, but missing item records don’t violate integrity rules.

Note

As mentioned earlier in the chapter, fields become columns and records become rows in a query. This terminology is an arbitrary convention of this book and not related to relational database design theory. The reason for the change in terminology is that a query’s rows and columns need not—and often do not—represent data values stored in the underlying tables. Queries can have columns whose values are calculated from multiple fields and rows with aggregated data, such as subtotals and totals.

Transaction processing (TP), also called online transaction processing (OLTP), solves the missing line item problem. Requiring TP for order entry, invoice processing, and similar multirecord operations enforces an all-or-nothing rule. If every individual update to the tables’ records occurs, the transaction succeeds (commits); if any update fails, changes made before the failure occurs are reversed (rolled back). Transaction processing isn’t limited to RDBMSs. Early mainframe databases offered TP and transaction monitors. IBM’s Customer Information and Control System (CICS, pronounced kicks) was one of the first transaction processing and monitoring systems, and it remains in widespread use today.

Access and SQL Server databases offer built-in TP features. Access has a Use Transactions property that you set to Yes to require TP for updates. SQL Server traditionally requires writing T-SQL statements—BEGIN TRANS, COMMIT TRANS, and ROLLBACK TRANS—to manage transactions, but Access 2010’s ADP forms have a new Batch Updates property that lets you enforce transactions without writing complex T-SQL statements.

  • For a brief description of the batch update feature introduced by Access 2007, see “Changes to ADP Features,” in Online Appendix B.

Displaying Data with Queries and Views

So far, this chapter has concentrated on designing relational databases and their tables, and adding or altering data. SQL SELECT queries return data to Access, but you don’t need to write SQL statements to display data in forms or print reports from the data. Access has built-in graphical tools to automatically write Access SQL for Access databases and T-SQL for SQL Server databases. Access’s query tools use a modern implementation of query-by-example (QBE), an IBM trademark. QBE is a simple method of specifying the tables and columns to view, how the data is sorted, and rows to include or exclude.

Linking related tables by their primary and foreign keys is called joining the tables. Early QBE programs required defining joins between tables; specifying table relationships automatically defines joins when you add records from two or more related Access or SQL Server tables.

Figure 4.10 is an example of Access’s QBE implementation for Access databases, called Query Design View. You add tables to the query—in this case, Northwind’s Customers, Orders, and Employees tables. As you add the tables, join lines indicate the relationships between them. You drag the field names for the query columns from the table lists in the upper pane to the Field row of the lower pane. You also can specify the name of a calculated column (Salesperson) and the expression to create the column values ([FirstName] & “” & [LastName]) in the Field row. The brackets surrounding FirstName and LastName designate that the values are field names.

an example
of Access’s QBE implementation for Access databases, called Query Design View
Figure 4.10

Access’s Query Design view for Access databases uses graphical QBE to create queries you can store in the database.

Selecting Ascending or Descending in the Sort column orders the rows in left-to-right column priority. You can restrict the display to a particular set of values by adding an expression in the Criteria column.

Running the query returns the resultset, part of which is shown by Figure 4.11. You can save the query for later reuse as a named Access QueryDef(inition) object in the database.

Running the query returns the
resultset
Figure 4.11

These are the first 16 of the 408 rows of the query resultset returned by executing the query design of Figure 4.10.

SELECT Customers.CompanyName, Orders.OrderID, Orders.OrderDate,
         [FirstName] & “” & [LastName] AS Salesperson
   FROM Employees
      INNER JOIN (Customers
         INNER JOIN Orders
         ON Customers.CustomerID = Orders.CustomerID)
      ON Employees.EmployeeID = Orders.EmployeeID
   WHERE ((Year([OrderDate])=2006))
   ORDER BY Customers.CompanyName, Orders.OrderID;

It’s obvious that using QBE is much simpler than writing SELECT queries to concatenate field values, join tables, establish row selection criteria, and specify sort order. Access’s QBE features are powerful; many developers use Access to generate the SQL statements needed by Visual Basic, C++, and Java programs.

Note

Access QBE automatically converts the query design of Figure 4.10 into the following Access SQL statement:

The da Vinci QBE tool for creating T-SQL views is similar to the Access Query Design view, but has an additional pane to display the T-SQL statement as you generate it. You add tables to the upper pane and drag field names to the Column cells of the middle pane. An SQL Server view is the client/server equivalent of an Access QueryDef. As with Access QueryDefs, you can execute a query on an SQL Server view.

Note

T-SQL uses + rather than & to concatenate strings, uses a single quote (‘) as the string delimiter, and requires a numerical instead of a string criterion for the YEAR function. Here’s the T-SQL version of the preceding Access SQL statement after the SELECT and WHERE clauses have been tweaked:

The TOP modifier is needed to permit an ORDER BY clause in a view; prior to the addition of the TOP keyword in SQL Server 7.0, creating sorted views wasn’t possible. The da Vinci query parser adds the TOP 100 PERCENT modifier if an ORDER BY clause is present. However, TOP 100 PERCENT ... ORDER BY doesn’t sort SQL Server 2005 views. Replacing 100 PERCENT with a large integer (<= 2147483647) sorts the view.

The dbo. prefix to table and field names is an abbreviation for database owner, the default owner for all SQL Server databases you create as a system administrator. Figure 4.12 shows the design of the T-SQL query generated by pasting the preceding statement into the da Vinci query pane.

Despite their common ANSI SQL-92 heritage, SQL Server won’t execute most Access SQL statements, and vice versa. Copying the preceding Access SQL statement to the Clipboard and pasting it into the SQL pane of the query designer for the NorthwindCS sample database doesn’t work. The da Vinci designer does its best to translate the Access SQL flavor into T-SQL when you paste, but you receive errors when you try to run the query.

SELECT TOP (2147483647) dbo.Customers.CompanyName,
      dbo.Orders.OrderID, dbo.Orders.OrderDate,
      dbo.Employees.FirstName + ‘ ‘ +
      dbo.Employees.LastName AS Salesperson
   FROM dbo.Employees
      INNER JOIN dbo.Customers
         INNER JOIN dbo.Orders
         ON dbo.Customers.CustomerID = dbo.Orders.CustomerID
      ON dbo.Employees.EmployeeID = dbo.Orders.EmployeeID
   WHERE (YEAR(dbo.Orders.OrderDate) = 2006)
   ORDER BY dbo.Customers.CompanyName, dbo.Orders.OrderID
  • For more information on the da Vinci toolset, see “Exploring SQL Server Views,” in online Chapter 27.
  • For detailed instructions on installing SQL Server Express and NorthwindCS.adp, see “Performing SQL Server Express Setup,” p. XXX (Chapter 1), and “Exploring the NorthwindCS Sample Project,” in online Chapter 27.

The Datasheet view of the SQL Server view generated by the preceding SQL statement is identical to the Access query’s Datasheet view shown in Figure 4.11.

the design of the T-SQL query
Figure 4.12

Pasting an Access SQL statement into Access’s version of the da Vinci query design tool and making a few minor changes to the T-SQL statement results in an SQL Server view equivalent to the Access query of Figure 4.10.

Microsoft Access 2010 In Depth, Rough Cuts By Roger Jennings
Copyright 2011
Pages: 1008
Edition: 1st
Rough Cuts
ISBN-10: 0-7686-9526-0
ISBN-13: 978-0-7686-9526-7




MS Access Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS Access Forum
Topic By Replies Updated
Help With Microsoft Access kasy 0 September 4th, 07:35 PM
Linked table not sorting or filtering - ODBC error Java 1 August 28th, 10:37 AM
Use Parameter in select statement (Sql in Microsoft Access) katty.jonh 1 July 25th, 06:45 AM
Query Issue algebroni 7 July 23rd, 04:22 PM


















Thanks for your registration, follow us on our social networks to keep up-to-date