Microsoft Access 2010 In Depth, Rough Cuts |
Microsoft Access 2010 has a collection of wizards to lead you step-by-step through each process involved in developing and using a production-grade database application. ‘ Exploring Relational Database Theory and Practice ‘ is extracted from ‘ Microsoft Access 2010 In Depth’, published by Que.
Moving from Spreadsheets to Databases
Word processing and spreadsheet
applications were the engines that drove the fledgling personal computer
market. In the early PC days, WordPerfect and Lotus 1-2-3 dominated the
productivity software business. Today, most office workers use Microsoft Word
and Excel on a daily basis. It’s probably a safe bet that more data is stored
in Excel spreadsheets than in all the world’s databases. It’s an equally good
wager that most new Access users have at least intermediate-level spreadsheet
skills, and many qualify as Excel power users.
Excel 2010’s Data ribbon offers
elementary database features, such as sorting, filtering, validation, and data
entry forms. You can quickly import and export data in a variety of formats,
including those of database management applications, such as Access. Excel’s
limitations become apparent as your needs for entering, manipulating, and
reporting data grow beyond the spreadsheet’s basic row-column metaphor.
Basically, spreadsheets are list managers; it’s easy to generate a simple name
and address list with Excel. If your needs expand to contact management and
integrating the contact data with other information generated by your
organization, a spreadsheet isn’t the optimal approach.
The first problem arises when
your contacts list needs additional rows for multiple persons from a single
company. You must copy or retype all the company information, which generates
redundant data. If the company moves, you must search and replace every entry
for your contacts at the firm with the new address. If you want to record a
history of dealings with a particular individual, you add pairs of date and
text columns for each important contact with the person. Eventually, you find
yourself spending more time navigating the spreadsheet’s rows and columns than
using the data they contain.
Contact lists are only one
example of problems that arise when attempting to make spreadsheets do the work
of databases. Tracking medical or biological research data, managing consulting
time and billings, organizing concert tours, booking artist engagements, and
myriad other complex processes are far better suited to database than
spreadsheet applications.
Moving to a relational database
management system (RDBMS), such as Access, solves data redundancy and
navigation problems and greatly simplifies updating existing information. After
you understand the basic rules of relational database design, Access makes
creating highly efficient databases quick and easy. Access 2010 has a
collection of wizards to lead you step-by-step through each process involved in
developing and using a production-grade database application. Unfortunately, no
“Relational Wizard” exists to design the underlying database structure for you,
but you’ll find a wealth of pre-built database templates in the Backstage
page’s New tab. (Click the ribbon’s File tab to open the new Backstage page.)
Tip
If your goal is learning
relational database fundamentals, start with Access 2010. Access is by far the
first choice of universities, colleges, trade schools, and computer-training
firms for courses ranging from introductory data management to advanced
client/server database programming. The reason for Access’s popularity as a
training platform is its unique combination of initial ease of use and support
for advanced database application development techniques
Reliving Database History
Databases form the foundation of
world commerce and knowledge distribution. Without databases, there would be no
World Wide Web, automatic teller machines, credit/debit cards, or online
airline reservation systems. Newsgathering organizations, research
institutions, universities, and libraries would be unable to categorize and
selectively disseminate their vast store of current and historical information.
It’s difficult to imagine today a world without a network of enormous
databases, many of which probably contain a substantial amount of your personal
data that you don’t want to be easily available to others.
Note
Jim Gray’s article,
“Data Management: Past, Present, and Future,” which is available as a Microsoft
Word document at http://research.microsoft.com/~gray/DB_History.doc, offers a
more detailed history of data processing systems. Dr. Gray was a senior
researcher and the manager of Microsoft’s Bay Area Research Center (BARC) until
early 2007, when he became lost at sea while sailing off the California coast
The Early History of Databases
The forerunner of today’s
databases consisted of stacks of machine-readable punched cards, which Herman
Hollerith used to record the 1890 U.S. census. Hollerith formed the
Computing-Tabulating-Recording Company, which later became International
Business Machines. From 1900 to the mid-1950s, punched cards were the primary
form of business data storage and retrieval, and IBM was the primary supplier
of equipment to combine and sort (collate) punched cards, and print reports
based on punched-card data.
The development of large
computer-maintained databases—originally called databanks—is a post–World War II phenomenon. Mainframes
replaced punched cards with high-capacity magnetic tape drives to store large
amounts of data. The first databases were built on the hierarchical and network
models, which were well suited to the mainframe computers of the 1950s.
Hierarchical databases use parent-child relationships to define data
structures, whose diagrams resemble business organization charts or an inverted
tree with its root at the top of the hierarchy. Network databases allow
relaxation of the rules of hierarchical data structures by defining additional
relationships between data items. Hierarchical and network databases ordinarily
are self-contained and aren’t easy to link with other external databases over a
network.
Early databases used batch
processing for data entry and retrieval. Keypunch operators typed data from
documents, such as incoming orders. At night, other operators collated the
day’s batch of punched cards, updated the information stored on magnetic tape,
and produced reports. Many smaller merchants continue to use batch processing
of customer’s credit-card purchases, despite the availability of terminals that
permit almost instantaneous processing of credit- and debit-card transactions.
Note
Hierarchical databases remain
alive and well in the twenty-first century. For example, data storage for Windows
2000’s Active Directory and Microsoft Exchange Server is derived from the
hierarchical version of Access’s original relational Jet databases. The name
Jet comes from the original Access database engine called Joint
Engine Technology.
The Internet’s Domain Name System
(DNS) is a collection of hierarchical databases for translating character-based
Internet domain names into numerical Internet Protocol (IP) addresses. The DNS
database is called a distributed database, because its
data is held by a global network of thousands of computers.
The Relational Database Model
Dr. E. F. Codd, an employee of IBM Corporation,
published “A Relational Model of Data for Large Shared Databanks” in a journal
of the Association for Computing Machinery (ACM) in June 1970. A partial copy
of the paper is available at http://www.acm.org/classics/nov95/. Dr. Codd’s
specialty was a branch of mathematics called set theory, which includes the
concept of relations. He
defined a relation as a named set of tuples (records or rows) that have attributes (fields or columns). One of the attributes must
contain a unique value to identify each tuple. The common term for relation is
a table whose presentation to the user is similar to that of
a spreadsheet.
Relational databases solve a
serious problem associated with earlier database types. Hierarchical and
network databases define sets of data and explicit links between each data set
as parent-child and owner-member, respectively. To extract information from
these databases, programmers had to know the structure of the entire database.
Complex programs in COBOL or other mainframe computer languages are needed to
navigate through the hierarchy or network and extract information into a format
understandable by users.
Dr. Codd’s objective was to
simplify the process of extracting formatted information and make adding or
altering data easier by eliminating complex navigational programming. During
the 1970s, Dr. Codd and others developed a comparatively simple language,
Structured Query Language (SQL), for creating, manipulating, and retrieving
relational data. With a few hours of training, ordinary database users could
write SQL statements to define simple information needs and bypass the delays
inherent in the database programming process. SQL, which was first standardized
in 1985, now is the lingua franca of
database programming, and all commercial database products support SQL.
Client/Server and Desktop RDBMSs
In the early database era, the
most common presentation of data took the form of lengthy reports processed by
centralized, high-speed impact printers on fan-folded paper. The next step was
to present data to the user on green-screen video terminals, often having small
printers attached, which were connected to mainframe databases. As use of
personal computers gained momentum, terminal emulator cards enabled PCs to
substitute for mainframe terminals. Mainframe-scale relational databases, such
as IBM’s DB2, began to supplement and later replace hierarchical and network
databases, but terminals continued to be the primary means of data entry and
retrieval.
Note
The most widely used SQL standard,
SQL-92, was published by the American National Standards Institute (ANSI) in
1992. Few, if any, commercial relational database management systems (RDBMSs)
today fully conform to the entire SQL-92 standard. The later SQL-99 (also
called SQL3) and SQL-200n specifications add new features that aren’t germane
to Access databases.
RDBMS competitors have erected an
SQL Tower of Babel by adding nonstandard extensions to the language. For
example, Microsoft’s Transact-SQL (T-SQL) for SQL Server, which is the subject
of Chapter 27, “Moving from Access Queries to Transact-SQL,” has many proprietary
keywords and features. Oracle Corporation’s Oracle:SQL and PL/SQL dialects also
have proprietary SQL extensions.
Oracle, Ingres, Informix,
Sybase, and other software firms developed relational databases for lower-cost
minicomputers, most of which ran various flavors of the UNIX operating system.
Terminals continued to be the primary data entry and display systems for
multiuser UNIX databases.
The next step was the advent of
early PC-based flat-file managers and relational database management systems.
Early flat-file database managers, typified by Jim Button’s PCFile for DOS
(1981) and Claris FileMaker for Macintosh (1988) and Windows (1992), used a
single table to store data and offered few advantages over storing data in a
spreadsheet. The early desktop RDBMSs—such as dBASE, Clipper, FoxBase, and
Paradox—ran under DOS and didn’t support SQL. These products later became
available in multiuser versions, adopted SQL features, and eventually migrated
to Windows. Access 1.0, which Microsoft introduced in November 1992, rapidly
eclipsed its DOS and Windows competitors by virtue of Access’s combination of
graphical SQL support, versatility, and overall ease of use.
PC-based desktop RDBMSs are
classified as shared-file systems because they store their data in conventional
files that multiple users can share on a network. One of Access’s initial
attractions for users and developers was its capability to store all
application objects—forms, reports, and programming code—and tables for a
database application in a single file, which used the earlier .mdb extension..
FoxPro, dBASE, Clipper, and Paradox require a multitude of individual files to
store application and data objects. Today, almost every multiuser Access application
is divided (split) into a front-end .accdb file, which contains application
objects and links to a back-end database .accdb file that holds the data. Each
user has a copy of the front-end .accdb file and shares connections to a single
back-end .accdb file on a peer Windows workstation or server.
Note
Prior to Access 2000, Jet was
Access’s standard database engine, so the terms Access
database and Jet database were interchangeable.
Microsoft considered SQL Server to be its strategic
RDBMS for Access 2000 and 2003. Strategic means that
SQL Server gets continuing development funds and Jet doesn’t. Jet 4.0, which
was included with Access 2003 and is a part of the Windows XP and later
operating systems, is the final version and is headed toward retirement.
Microsoft’s Access team decided to
enhance Jet 4.0 with the new features described in Chapter 1, “Access 2010 for
Access 2007 Users: What’s New,” change the file extension from .mdb to .accdb,
and drop all references to Jet. To reflect this change, this edition uses the
terms Access database and SQL Server
database. Unless otherwise noted, SQL
Server refers to all SQL Server 2005 editions except the Compact and
Mobile editions.
Client/server RDBMSs have an
architecture similar to Access’s front-end/back-end shared-file multiuser configuration.
What differentiates client/server from shared-file architecture is that the
RDBMS on the server handles most of the data-processing activity. The client
front end provides a graphical user interface (GUI) for data entry, display,
and reporting. Only SQL statements and the specific data requested by the user
pass over the network. Client/server databases traditionally run on network operating
systems, such as Windows and UNIX, and are much more robust than shared-file
databases, especially for applications in which many users make simultaneous
additions, changes, and deletions to the database. All commercial data-driven
Web applications use client/server databases.
Note
This book uses the terms field and record when referring to
tables, and columns and rows
when discussing data derived from tables, such as the views and query result
sets described later in this chapter.
Since version 1.0, Access has
had the capability to connect to client/server databases by linking their
tables to an Access database. Linking lets you treat client/server tables
almost as if they were native Access tables. Linking uses Microsoft’s widely
accepted Open Database Connectivity (ODBC) standard, and Access 2010 includes
an ODBC driver for SQL Server and Oracle databases. You can purchase licenses
for ODBC drivers that support other UNIX or Windows RDBMSs, such as Sybase or Informix,
from the database supplier or third parties. Chapter 19, “Linking Access Front
Ends to Access and Client/Server Databases,” describes the process of linking
Access and Microsoft SQL Server 2008 databases. Although Chapter 19 uses SQL
Server for its examples, the linking procedure is the same for—or at least
similar to—other client/server RDBMSs.
Access data projects (ADP) and
the Microsoft SQL Server 2005 Express Edition combine to make Access 2010 a
versatile tool for designing and testing client/server databases and creating
advanced data entry and reporting applications. You can start with a
conventional Access database and later use Access’s Upsizing Wizard to convert
the .mdb file(s) to an .adp file that holds application objects and an SQL
Server 2005 back-end database. Access 2010’s Upsizing Wizard has incorporated
many improvements to the Access 2000 and earlier wizard versions, but Access
2010’s Wizard is the same as 2007’s. Despite the upgraded wizardry, you’re
likely to need to make changes to queries to accommodate differences between
Access and SQL Server’s SQL dialects.
-
For an example of differences between
Access and SQL Server SQL syntax that affects the upsizing process, see
“Displaying Data with Queries and Views,” p. XXX (this chapter).
Defining the Structure of Relational Databases
Relational databases consist of
a collection of self-contained, related tables. Tables typically represent
classes of physical objects, such as customers, sales orders, invoices, checks,
products for sale, or employees. Each member object, such as an invoice, has
its own record in the invoices table. For invoices, the field that uniquely
identifies a record, called a primary key[field],
is a serial invoice number.
Figure 4.1 shows Access’s
Datasheet view of an Invoices table, which is based on the Northwind.mdb sample
database’s Orders table. The InvoiceNo field is the primary key. Values in the
OrderID, CustomerID, EmployeeID, and ShipperID fields relate to primary key
values in Northwind’s Orders, Customers, Employees, and Shippers tables. A
field that contains values equal to those of primary key values in other tables
is called a foreign key [field].
Figure 4.1
This simple Invoices table was
created from the Northwind Orders table and doesn’t take advantage of Access’s
extended properties, such as the field captions, lookup fields, and
subdatasheets in the Datasheet view of the Orders table.
-
To learn more about primary keys in
Access tables, see “Selecting a Primary Key,” p. XXX (Chapter 5).
If you need information about a
particular invoice or set of invoices, open the Invoices table and search for
the invoice(s) by number (InvoiceNo) or another attribute, such as a customer
code (CustomerID), date (ShippedDate), or range of dates. Unlike earlier
database models, the user can access the Invoices table independently of its
related tables. No database navigation programming is needed. A simple, intuitive
SQL statement, SELECT * FROM Invoices, returns all the data in the table. The
asterisk (*) represents a request to display the contents of all fields of the
table.
Removing Data Redundancy with Relationships
The Invoices table of Figure 4.1
is similar to a spreadsheet containing customer billing information. What’s
missing is the customer name and address information. A five-character customer
code (CustomerID) identifies each customer to whom the invoice is directed. The
CustomerID values in the Invoices table match CustomerID values in a modified
version of Northwind’s Customers table (see Figure 4.2). Matching a foreign key
with a primary key value often is called a lookup operation. Using a key-based lookup operation eliminates the
need to repeatedly enter name, address, and other customer-specific data in the
Invoices table. In addition, if you change the customer’s address, the change
applies to all past and future invoices.
Figure 4.2
Foreign key values in the
Invoices table must match primary key values in the Customers table.
The Invoices table also
connects with other tables, which contain information on orders, sales department
employees, and the products ordered. Connections between fields of related
tables having common values are called relationships (not relations). Figure 4.3 shows Access’s
Relationships window displaying the relationships between the Invoices table
and the other tables of the Northwind sample database.
Figure 4.3
Access’s Relationships window
displays the relationships between the tables of the Northwind sample database,
plus the added Invoices table. Every relationship between these tables is
one-to-many. The many-to-many relationship between Products and Orders is an
indirect relationship.
Tip
Using derived key values, such as
alphabetic codes for Customer, is no longer in favor among database designers.
Most designers now use automatically generated numerical key values—called
Access AutoNumber or SQL Server identity fields. The
Northwind Orders and Products tables, among others, have primary keys that use
the AutoNumber data type. The Employees, Shippers, Products, and Suppliers
tables use AutoNumber keys to identify the persons or objects to which the
table’s records refer. Objects that are inherently sequentially numbered, such
as checks, are ideal candidates for an AutoNumber key that corresponds to the
check number, as mentioned in “Choosing Primary Key Codes” later in this
chapter.
Another method of generating
unique keys is by use of Globally Unique Identifiers (GUIDs), which also are
called Universally Unique Identifiers (UUIDs). GUIDs are 16-byte computed
binary numbers that are guaranteed to be unique locally and universally; no
other computer in the world will duplicate a GUID. SQL Server’s
uniqueidentifier data type is a GUID. Because GUIDs can’t represent a property
of an object, such as a check number, GUID keys are called surrogate
keys. You can’t select a GUID data type in Access’s Table Design mode.
Relationships come in the
following three flavors:
- One-to-many relationships represent connections between a single
primary key value (the “one” side) and multiple instances of the same value in
the foreign key field (the “many” side). One-to-many relationships commonly are
identified by the number 1 and the infinity (∞) symbol, as in Figure 4.3.
All the direct relationships between the tables in Figure 4.3 are one-to-many.
One-to-many—also called many-to-one—relationships are by far the most common. - One-to-one relationships connect primary key values in two
tables. You might think that the relationship between the Orders and Invoices
tables could be one-to-one, but an order requires more than one invoice if one
or more items are backordered and then shipped later. One-to-one relationships
are uncommon. - Many-to-many relationships require three tables, one of which is
called a linking table.
The linking table must have two foreign keys, each of which has a many-to-one
relationship with a primary key in two related tables. In the example of Figure
4.3, the Order Details table is the linking table for the many-to-many
relationship between the Orders and Products tables. Many-to-many relationships
also are called indirect relationships.
There are many other indirect
relationships between the tables shown in Figure 4.3. For example, a
many-to-many relationship exists between the Suppliers and Orders tables. In
this case, Products and Order Details act as linking tables between the
Suppliers and Orders tables.
The Relationships window
displays the names of primary key fields in a boldface font. Notice in Figure
4.3 that the OrderID and ProductID field names are preceded by a key symbol.
The OrderID and ProductID fields compose a composite primary key, which uniquely identifies an order line item. You
can’t repeat the same combination of OrderID and ProductID; this precaution
makes sense for products that have only one stock-keeping unit (SKU), such as
for Aniseed Syrup, which comes only in a carton of 12 550ml bottles.
Note
Access 2010’s multivalue field
feature automatically generates a hidden linking table “under the covers.”
Access 2007 introduced the multivalued field for compatibility with SharePoint
lists.
The Oakmont.accdb sample database file in the
\2010Samples\Oakmont folder of the downloadable code has a structure that
differs from that of Northwind.accdb, but the design principles of the two databases
are similar. OakmontSQL.mdf is an SQL Server 2008 database for use with ADP.
ADP uses a special set of tools—called the project designer or da Vinci toolset in this book—for designing and managing SQL Server
databases. The Oakmont files are course enrollment databases for a college.
Figure 4.4 shows the Database Diagram window for the OakmontSQL database. The
SQL Server Diagram window is similar to the Relationships window for Access’s
traditional Access databases. The key and infinity symbols at the ends of each
line represent the one and many sides, respectively, of the one-to-many
relationships between the tables. Access and SQL Server databases store information
on table relationships as an object within the database file.
Figure 4.4
The SQL Server Database Diagram
window for the OakmontSQL database shows one-to-many relationships between
primary key fields (identified by key symbols) and foreign key fields (infinity
symbols).
This book uses the Access 2010
and SQL Server 2008 R2 versions of the Northwind and Oakmont sample databases
in almost all examples. The tables of the Oakmont database have many more
records than the Northwind tables. The large number of records in the Oakmont
database makes it better suited than Northwind for predicting the performance
of production Access and SQL Server database applications.
Note
The one-product-entry-per-order
restriction prevents shared use of the Order Details table as an invoice line
items table. If you short-ship an order item on one invoice, you can’t add
another record to the Order Details table when you ship the remaining quantity
of the item. Microsoft didn’t add an Invoices table for Northwind Traders, probably
because of the complexity of dealing with backorders and drop-shipments.