All of this so far should be familiar to regular
readers of SQLwire. Now, we introduce the 2nd table, that will show the real power
of relational databases. First, a quick introduction to relational databases.
Why are they given this name? The answer comes from the fact that, unlike
earlier database structures (hierarchical and network), relational databases
allow potentially every file, or table, to relate to every other one. They do
this by using a common field. Let's add another couple of tables. (For more on
database design, I suggest you read the article on database
normalization.) We add the following tables:
tour_locations
| location_code |
location_name |
| 1024 |
Table Mountain |
| 1025 |
Robben Island |
| 1026 |
Kruger National Park |
| 1027 |
St Lucia |
tour_expeditions
| location_code |
employee_number |
hours_worked |
tourgroup_size |
| 1024 |
978 |
5 |
8 |
| 1027 |
942 |
8 |
4 |
| 1025 |
923 |
3 |
20 |
| 1026 |
982 |
6 |
8 |
| 1024 |
978 |
5 |
8 |
| 1025 |
978 |
3 |
16 |
I assume by now you can do the CREATE and INSERT statements to populate the
above tables. Now you should be able to see the reason for the term relational
database. The way these tables relate is by the common fields they have -
tour expeditions joins to tour_guides through the field
employee_number, and to tour_locations though
location_code. Note that the field names do not have to be the same in
both tables, as long as their definitions are the same (ie both int in this
case). Try and make as many fields as possible NOT NULL (fields where there
cannot logically be a NULL value). For example, the fields location_code
and employee_number in the table tour_expeditions are good
candidates. Make them NOT NULL now, and we'll reap the benefits later!
Now comes the crux. How would we answer the question "Which employees worked in
which locations?". The secret here is to use the fields that relate in each
table to join. Let's first answer a more simple question to introduce the
concept. "Which employee_numbers worked in which locations?". We would use the
following query:
SELECT employee_number,location_name
FROM tour_locations,tour_expeditions
WHERE tour_locations.location_code = tour_expeditions.location_code
This returns:
| employee_number |
location_name |
| 978 |
Table Mountain |
| 942 |
St Lucia |
| 923 |
Robben Island |
| 982 |
Kruger National Park |
| 978 |
Table Mountain |
| 978 |
Robben Island |
How did we get this query? The first part, immediately after the SELECT lists
the fields we want to return. Easy enough - employee_numbers and
location_name. The second part, after the FROM, provides the tables that
contain the fields. In this case it's clearly tour_locations for
location_name. But which table to choose for employee_number? Both
tour_expeditions and tour_guides contain this field. Here, we have
to look at which table is related to tour_location. Since
tour_location is related only to tour_expedition (through the
location_code field), we could only use tour_expedition. And the
third part, after the WHERE clause, tells us which fields the relation exists
on, or are being joined.
The usual SELECT rules apply. To bring back only the employee_numbers that
gave a tour to Table Mountain, and to bring back only unique records (notice
that the above query brought back a duplicate value, as there are 2 records that
apply), we use:
SELECT DISTINCT employee_number,location_name
FROM tour_locations,tour_expeditions
WHERE tour_locations.location_code = tour_expeditions.location_code AND
location_name='Table Mountain'
| employee_number |
location_name |
| 978 |
Table Mountain |
is the only row returned. Note
how the DISTINCT keyword returns only one identical row. Without it, we would
have returned 2 identical rows, one for each time employee 978 gave a tour to
Table Mountain.
How now do we return the name of the employees, as we
originally requested, not just their numbers. To do so, we join the
tour_expedition table to the tour_guide table, on the
employee_number field, as follows:
SELECT DISTINCT
tour_expeditions.employee_number,employee_name,location_name
FROM
tour_locations,tour_expeditions,tour_guides
WHERE tour_locations.location_code = tour_expeditions.location_code AND
tour_expeditions.employee_number=tour_guides.employee_number
This brings back
| employee_number |
employee_name |
location_name |
| 978 |
Siyabonge Nomvete |
Table Mountain |
| 923 |
Tyrone Arendse |
Robben Island |
| 978 |
Siyabonge Nomvete |
Robben Island |
| 982 |
Mathew Booth |
Kruger National Park |
| 942 |
Jean-Marc Ithier |
St Lucia |
Note the changes we made to our
original join. We've added employee_name to the fields returned,
tour_guides to the table list, and we've had to add the name of the table
to the employee_number field of tour_expeditions, making it
tour_expeditions.employee_number (now that there are 2 tables returning the
employee_number, we need to specify which table to use.) In this case it
makes no difference, but in others it may. And finally, we've added a join
condition to the WHERE clause, pointing out the relation to use to join the 2
tables.