As an Oracle DBA, you know that Oracle is a relational
database management system and that it performs many complex functions well. In
the pressure to learn how to be a DBA or what it is you should be doing, it is
hard to take the time to sit back and appreciate the origins of the RDBMS model
in general and where and how Oracle came into being. It is important to have an
appreciation of your profession’s history and origins. Often times, we take for
granted the functions an RDBMS performs. But where, exactly, did the concept of
the relational model come from?
“A Relational Model of Data for Large Shared Data Banks”
A paper written by E.F. Codd (yes, that Codd) 35
years ago set the stage for the development of the relational model. Codd’s “A
Relational Model of Data for Large Shared Data Banks” has been hailed as a
landmark paper. Even his own employer, IBM – an industry leader in several
areas at the time, including a leading database system – largely ignored the
brilliance of what Dr. Codd proposed in 1970. How relational database management
systems operate today – including Oracle – can be traced back to what Dr. Codd
proposed back then.
The paper consists of two major sections: a narrative
describing the shortcomings of the then current models and systems, and a
proposal for how a relational model should work. A key distinction between the
model types is that of data independence. Current models then were so
inherently tied to programs and physical file structures that minor changes
could invalidate the entire system. In the relational model, the separation
between data and how an application or computer system works is paramount – the
further apart the better.
Limitations of the Old Systems
Network and hierarchal-based systems were hobbled by data
dependency. This problem arose in three forms: ordering dependence, indexing dependence
and access path dependence. Ordering dependency was common to all of the major
systems then in use. Specifically, they all failed “to make a clear distinction
between order of presentation on the one hand and stored ordering on the
other.” In practical terms, once entered, there was no way to change the order.
Indexing dependence involves the relationship between
applications and indices (or indexes as the word is now used). If an index were
removed, would the application still function? Thirty plus years ago, the
answer was mostly “no.” Even then, the benefits and limitations of indexes were
well known. The major problem for some systems concerned the situation where an
index was renamed or removed. If an application was hard coded to use a
specific index name, and the index was no longer present, would the application
still function?
Codd illustrates the third limitation – access path
dependence – with a series of five different parts versus project structures,
and goes on to point out how any program will fail on at least three of the
arrangements. In concrete terms, if a program were tied to a specific structure
or location of data, then changes in the structure or location would most
likely cause a failure of the program. This dependency illustrates the need for
a program to be “blind” to structure and location.
Enter the Relational View of Data
Consider what you take for granted today – “of course the
database is supposed to act this way” – and put yourself in Codd’s shoes back
in 1970. What seems so obvious and practical today required a paradigm shift
back then. IBM had invested tremendous amounts of time and capital on its own
non-relational problem-ridden system, and here comes a proposal from an
employee at an IBM location (San Jose) that was not even involved with
programming. Quite a few years went by before the genius of what Codd proposed
was successfully implemented.
What hindered and delayed the acceptance of the relational
model was the lack of a structured query language that could take advantage of
such a model, and obviously, the lack of a working relational model. Similar
situations exist today. For example, scientists can tell you what the
limitations are with the current rocket-based space program. At the same time,
they can tell you what the star ship Enterprise should be capable of, but how
to build it and how to make it work are not known at this time. Like Codd back
in 1970, we have the theory, but we do not have the technology.
So what does the relational model do for us, and how is it
better than what was being used? The first major difference discussed in the
paper has to do with moving from using relations to using relationships.
Relations depend on a domain ordering (the same concept seen in algebra classes
– domain versus range, function versus relation). Relationships are indifferent
to the order of data, so it is sufficient to be able to uniquely identify
domains (table columns) without regard to order.
The implementation of relationships requires the
introduction of primary and foreign keys as a “preferred way.” The preferred
way Codd introduced is what is also known as normal form (NF). The concept of
what has become known as 1NF, 2NF and so on is well known today and Codd
alludes to further normalization of the example he uses to illustrate the
employee example.
What is another name for SQL? How about “applied predicate
calculus?” As Codd notes, “The adoption of a relational model of data … permits
the development of a universal data sub-language based on an applied predicate
calculus.” The paper does not go into the language in detail, but Codd does
discuss its salient details (the ship should go at least warp factor three…).
SQL statements work with sets of data, either all of the
possible values, or a subset. We may fetch a set for query purposes, or hold it
for possible changes. Once the set is obtained, we should be able to exploit
its relations (manipulate the data). Codd recognized back then that a
high-level language would be needed to “provide efficient response and
throughput shifts from the individual user to the data system.”
When writing a SQL statement, the user is performing
operations on relations. Three major relational operators familiar to most
users are restriction, projection, and joins. Restrictions are employed by
using WHERE clauses, which can further be restricted by ANDing or ORing
additional conditions. Projections are exemplified during partial selects out
of a table (selecting specific domains, or columns as opposed to select *).
Joins are frequently used to compare sets of data (one table against another,
and even the same table against itself).
In addition to the topics of strong and weak redundancy
(maintaining multiple copies of a column across tables for user convenience),
the final major characteristic of the relational model covered in the paper
deals with consistency. Although not directly mentioned, consistency relates to
referential integrity. Whether or not user omission or commission introduces
(or at least attempts to introduce) an inconsistency, the system should be able
to perform a check prior to insertion, deletion, or update. This is one of the
approaches discussed by Codd and is obviously the one modern relational
databases use today.
As you read this historic paper, you can practically see the
blueprint for Oracle and other modern relational database systems. The ideal
system adheres to 12 rules later, well, codified by Dr. Codd in 1985.
Looking back on history, it is not surprising that someone – Lawrence Ellison
in this case – would fully realize the importance and relevance of Codd’s
relational model and seize upon the opportunity to build a better system. By
then, SQL was in place and all (not to trivialize the amount of work “all”
implies) that was needed was an engine to drive the system.