Where Did Oracle Come From?
July 13, 2005
As an Oracle DBA, you know that Oracle is a relational database management system and that it performs many complex functions well. In the pressure to learn how to be a DBA or what it is you should be doing, it is hard to take the time to sit back and appreciate the origins of the RDBMS model in general and where and how Oracle came into being. It is important to have an appreciation of your profession's history and origins. Often times, we take for granted the functions an RDBMS performs. But where, exactly, did the concept of the relational model come from?
"A Relational Model of Data for Large Shared Data Banks"
A paper written by E.F. Codd (yes, that Codd) 35 years ago set the stage for the development of the relational model. Codd's "A Relational Model of Data for Large Shared Data Banks" has been hailed as a landmark paper. Even his own employer, IBM - an industry leader in several areas at the time, including a leading database system - largely ignored the brilliance of what Dr. Codd proposed in 1970. How relational database management systems operate today - including Oracle - can be traced back to what Dr. Codd proposed back then.
The paper consists of two major sections: a narrative describing the shortcomings of the then current models and systems, and a proposal for how a relational model should work. A key distinction between the model types is that of data independence. Current models then were so inherently tied to programs and physical file structures that minor changes could invalidate the entire system. In the relational model, the separation between data and how an application or computer system works is paramount - the further apart the better.
Limitations of the Old Systems
Network and hierarchal-based systems were hobbled by data dependency. This problem arose in three forms: ordering dependence, indexing dependence and access path dependence. Ordering dependency was common to all of the major systems then in use. Specifically, they all failed "to make a clear distinction between order of presentation on the one hand and stored ordering on the other." In practical terms, once entered, there was no way to change the order.
Indexing dependence involves the relationship between applications and indices (or indexes as the word is now used). If an index were removed, would the application still function? Thirty plus years ago, the answer was mostly "no." Even then, the benefits and limitations of indexes were well known. The major problem for some systems concerned the situation where an index was renamed or removed. If an application was hard coded to use a specific index name, and the index was no longer present, would the application still function?
Codd illustrates the third limitation - access path dependence - with a series of five different parts versus project structures, and goes on to point out how any program will fail on at least three of the arrangements. In concrete terms, if a program were tied to a specific structure or location of data, then changes in the structure or location would most likely cause a failure of the program. This dependency illustrates the need for a program to be "blind" to structure and location.
Enter the Relational View of Data
Consider what you take for granted today - "of course the database is supposed to act this way" - and put yourself in Codd's shoes back in 1970. What seems so obvious and practical today required a paradigm shift back then. IBM had invested tremendous amounts of time and capital on its own non-relational problem-ridden system, and here comes a proposal from an employee at an IBM location (San Jose) that was not even involved with programming. Quite a few years went by before the genius of what Codd proposed was successfully implemented.
What hindered and delayed the acceptance of the relational model was the lack of a structured query language that could take advantage of such a model, and obviously, the lack of a working relational model. Similar situations exist today. For example, scientists can tell you what the limitations are with the current rocket-based space program. At the same time, they can tell you what the star ship Enterprise should be capable of, but how to build it and how to make it work are not known at this time. Like Codd back in 1970, we have the theory, but we do not have the technology.
So what does the relational model do for us, and how is it better than what was being used? The first major difference discussed in the paper has to do with moving from using relations to using relationships. Relations depend on a domain ordering (the same concept seen in algebra classes - domain versus range, function versus relation). Relationships are indifferent to the order of data, so it is sufficient to be able to uniquely identify domains (table columns) without regard to order.
The implementation of relationships requires the introduction of primary and foreign keys as a "preferred way." The preferred way Codd introduced is what is also known as normal form (NF). The concept of what has become known as 1NF, 2NF and so on is well known today and Codd alludes to further normalization of the example he uses to illustrate the employee example.
What is another name for SQL? How about "applied predicate calculus?" As Codd notes, "The adoption of a relational model of data ... permits the development of a universal data sub-language based on an applied predicate calculus." The paper does not go into the language in detail, but Codd does discuss its salient details (the ship should go at least warp factor three...).
SQL statements work with sets of data, either all of the possible values, or a subset. We may fetch a set for query purposes, or hold it for possible changes. Once the set is obtained, we should be able to exploit its relations (manipulate the data). Codd recognized back then that a high-level language would be needed to "provide efficient response and throughput shifts from the individual user to the data system."
When writing a SQL statement, the user is performing operations on relations. Three major relational operators familiar to most users are restriction, projection, and joins. Restrictions are employed by using WHERE clauses, which can further be restricted by ANDing or ORing additional conditions. Projections are exemplified during partial selects out of a table (selecting specific domains, or columns as opposed to select *). Joins are frequently used to compare sets of data (one table against another, and even the same table against itself).
In addition to the topics of strong and weak redundancy (maintaining multiple copies of a column across tables for user convenience), the final major characteristic of the relational model covered in the paper deals with consistency. Although not directly mentioned, consistency relates to referential integrity. Whether or not user omission or commission introduces (or at least attempts to introduce) an inconsistency, the system should be able to perform a check prior to insertion, deletion, or update. This is one of the approaches discussed by Codd and is obviously the one modern relational databases use today.
As you read this historic paper, you can practically see the blueprint for Oracle and other modern relational database systems. The ideal system adheres to 12 rules later, well, codified by Dr. Codd in 1985. Looking back on history, it is not surprising that someone - Lawrence Ellison in this case - would fully realize the importance and relevance of Codd's relational model and seize upon the opportunity to build a better system. By then, SQL was in place and all (not to trivialize the amount of work "all" implies) that was needed was an engine to drive the system.