Introduction to Relational Databases
June 24, 2002
Introduction
Many web developers are self-taught, learning HTML, then moving
on to a programming language such as PHP. From there, they often
learn to integrate this with a database. Too few though have a
good theoretical knowledge of databases. Mention foreign keys, or
referential integrity, and you're met with a blank stare. Small
databases can be easily designed with little database theory
knowledge. But large databases can easily get out of hand when
badly designed, leading to poor performance, and resulting in
the whole database needing to be rebuilt later. This article is
a brief introduction to the topic of relational databases, and
will hopefully whet your appetite for further exploration.
The Relational Database Model
A database can be understood as a collection of related files.
How those files are related depends on the model used. Early
models included the hierarchical model (where files are related in
a parent/child manner, with each child file having at most one
parent file), and the network model (where files are related as
owners and members, similar to the network model except that each
member file can have more than one owner).
The relational database model was a huge step forward, as it
allowed files to be related by means of a common field. In order
to relate any two files, they simply need to have a common field,
which makes the model extremely flexible.
Poet
| Code | First Name | Surname | Age |
| 1 | Mongane | Afrika | 62 |
| 2 | Stephen | Serote | 58 |
| 3 | Tatumkhulu | Watson | 29 |
Poem
| Title | Poet |
| Wakening Night | 1 |
| Thrones of Darkness | 2 |
| Once | 3 |
These two tables relate through the code field in the poet table,
and the poet field in the poem table. We can see who wrote the
poem 'Once' by following the relationship, and see that it was
poet 3, or Tatumkhulu Watson.
In 1970, when E.F. Codd developed the model, it was thought to be
hopelessly impractical, as the machines of the time could not
cope with the overhead necessary to maintain the model. Of course,
hardware since then has come on in huge strides, so that today
even the most basic of PC's can run sophisticated relational
database management systems. Together with this went the
development of
SQL.
SQL is relatively easy to learn and allows
people to quickly learn how to perform queries on a relational
database. This simplicity is part of the reason that relational
databases now form the majority of databases to be found.
Basic Terms
An understanding of relational databases requires an understanding
of some of the basic terms.
- Data are the values stored in the database. On its own, data
means very little. "43156" is an example.
- Information is data that is processed to have a meaning. For
example, "43156" is the population of the town of Littlewood.
- A database is a collection of tables.
- Each table contains records, which are the horizontal rows
in the table. These are also called tuples.
- Each record contains fields, which are the vertical columns
of the table. These are also called attributes. An example would
be a product record.
- Fields can be of many different types. There are many standard
types, and each DBMS (database management system, such as
Oracle
or
MySQL)
can also have their own specific types, but generally they fall
into at least three kinds - character, numeric and date. For
example, a product description would be a character field, a
product release date would be a date field, and a product
quantity in stock would be a numeric field.
- The domain refers to the possible values each field can
contain (it's sometimes called a field specification). For
example, a field entitled "marital_status" may be limited to the
values "Married" and "Unmarried".
- A field is said to contain a null value when it contains
nothing at all. Fields can create complexities in calculations
and have consequences for data accuracy. For this reason, many
fields are specifically set not to contain NULL values.
- A key is a logical way to access a record in a table. For
example, in the product table, the product_id field could allow
us to uniquely identify a record. A key that uniquely identifies
a record is called a primary key.
- An index is a physical mechanism that improves the performance
of a database. Indexes are often confused with keys. However,
strictly speaking they are part of the physical structure, while
keys are part of the logical structure.
- A view is a virtual table made up of a subset of the actual
tables.
- A one-to-one (1:1) relationship occurs where, for each instance
of table A, only one instance of table B exists, and vice-versa.
For example, each vehicle registration is associated with only one
engine number, and vice-versa
- A one-to-many (1:m) relationship is where, for each instance
of table A, many instances of the table B exist, but for each
instance of table B, only once instance of table A exists. For
example, for each artist, there are many paintings. Since it is
a one-to-many relationship, and not many-to-many, in this case
each painting can only have been painted by one artist.
- A many to many (m:n) relationship occurs where, for each
instance of table A, there are many instances of table B, and for
each instance of table B, there are many instances of the table A.
For example, a poetry anthology can have many authors, and each
author can appear in many poetry anthologies.
- A mandatory relationship exists where, for each instance of
table A, one or more instances of table B must exist. For example,
for a poetry anthology to exist, there must exist at least one
poem in the anthology. The reverse is not necessarily true though,
as for a poem to exist, there is no need for it to appear in a
poetry anthology.
- An optional relationship is where, for each instance of table
A, there may exist instances of table B. For example, a poet does
not necessarily have to appear in a poetry anthology. The reverse
isn't necessarily true though, for example for the anthology to
be listed, it must have some poets.
- Data integrity describes the accuracy, validity and
consistency of data. An example of poor integrity would be where
a poet's name is stored differently in two different places.
- Database normalization
is a technique that helps us to reduce
the occurrence of data anomalies and poor data integrity.
Page 2: Table Keys
|