Normalization is almost universally applied
to relational databases such as MySQL in order to optimize
tables for general-purpose querying and to rid them of certain undesirable
characteristics that could lead to a loss of data integrity. Doing so tends to promote better accuracy of
queries, but it also leads to queries that take a little more work to develop,
as the data may be spread amongst several tables. In today’s article, we’ll
learn how to fetch data from multiple tables by using joins.
Two Syntaxes
There
are two accepted syntax styles for writing joins: ANSI-style joins and theta-style joins. ANSI syntax uses the JOIN and ON keywords, as in the following example:
SELECT field1,
field2,
FROM my_table t1
JOIN my_other_table t2
ON t1.primary_id_field = t2.foreign_key_id_field
WHERE t1.lastname = ‘Smith’;
The JOIN keyword is used to
separate the names of the tables being joined, and the ON clause contains the
relation showing which column is being used as the join key.
In
theta-style syntax, the table joins are simply added to the WHERE clause:
SELECT field1,
field2,
FROM my_table t1,
My_other_table t2
WHERE t1.primary_id_field = t2.foreign_key_id_field
AND t1.lastname = ‘Smith’;
MySQL also supports a nonstandard extension of the
ANSI syntax that can be used as a sort of shorthand for when the join column
has the same name in both joined tables:
SELECT field1,
field2,
FROM my_table t1
JOIN my_other_table t2
USING (t1.id_field)
WHERE t1.lastname = ‘Smith’;
So Which Syntax is Best?
The ANSI syntax is generally
preferable to theta style because it’s usually easier to read and understand,
particularly when writing joins involving numerous tables. There are also some
types of joins that can’t be written using theta-style notation in MySQL.
Join Types
In order to join
tables together, there has to be some common data that allow those tables to be
connected in some meaningful way. Although it’s possible to have more than one common column
between two tables, most often, the join key will be the primary key of one
table and a foreign key in the other.
To illustrate, we’ll perform queries against the following
sample tables:
|
|
*Note: In a normalized database, the Manufacturer ID
would be stored in the Models table. Here, I included the full description to
better illustrate which manufacturers are associated with which models.
The first table contains automobile manufacturers; the
second, some models that are built by the first several auto manufacturers.
The common data between the two tables is the manufacturer, which is linked by
manufacturer ID.
Now we’ll extract some data from the tables, using different
join types in ANSI syntax.
Inner Join
An inner join is defined
as a join in which unmatched rows from either table are not to be returned. In
other words, the rows must match in both tables in order to be included in the
result set.
SELECT t1.description AS ‘Manufacturer’,
t2.description AS ‘Model’
FROM manufacturer t1
INNER JOIN model t2
ON t1.id = t2.manufacturer_id
WHERE t1.description = ‘ACURA’;
The "INNER" keyword is not required,
but it is considered good practice to include it.
Typing the query above in the MySQL Command
Line Client produces the following:
mysql> SELECT t1.description AS ‘Manufacturer’,
-> t2.description AS ‘Model’
-> FROM manufacturer t1
-> INNER JOIN model t2
-> ON t1.id = t2.manufacturer_id
-> WHERE t1.description = ‘ACURA’;
+————–+———+
| Manufacturer | Model |
+————–+———+
| ACURA | INTEGRA |
| ACURA | CL |
| ACURA | LEGEND |
| ACURA | RL |
| ACURA | NSX |
| ACURA | TL |
| ACURA | VIGOR |
| ACURA | EL |
| ACURA | NSX-T |
+————–+———+
9 rows in set (0.00 sec)
Outer Join
Outer joins will return records in one table that
aren’t matched in another. Outer joins can be further divided into the two
types of left and right. In a left outer join, all records from the first
(left-hand) table in a join that meet any conditions set in the WHERE clause are returned, whether
or not there’s a match in the second (right-hand) table:
mysql> SELECT t1.description AS ‘Manufacturer’,
-> t2.description AS ‘Model’
-> FROM manufacturer t1
-> LEFT JOIN model t2
-> ON t1.id = t2.manufacturer_id
-> WHERE t1.description = ‘New Car Co’;+————–+—————-+
| Manufacturer | Model |
+————–+—————-+
| New Car Co | (NULL) |
+————–+—————-+
19 rows in set (0.00 sec)
The ‘New Car Co’ is returned even though there are no
associated models in the model table.
Similar to the left outer join, a right outer join returns all records from
the second (right-hand) table in a join that meet any conditions set in the WHERE clause, whether or not
there’s a match in the first (left-hand) table:
mysql> SELECT t1.description AS ‘Manufacturer’,
-> t2.description AS ‘Model’
-> FROM manufacturer t1
-> RIGHT JOIN model t2
-> ON t1.id = t2.manufacturer_id
-> WHERE t2.description = ‘Custom’;
+————–+——–+
| Manufacturer | Model |
+————–+——–+
| (NULL) | Custom |
+————–+——–+
1 row in set (0.00 sec)
The ‘Custom’ model is returned even though there is no
associated manufacturer.
Cross-join
The cross-join, also referred to as a Cartesian product, returns
all the rows in all the tables listed in the query. Each row in the first
table is paired with all the rows in the second table. This happens when
there is no relationship defined between the two tables.
Note that, most of the time, we do not want a Cartesian
join, and we end up with one because we failed to provide a filter on the join.
Result sets can get large quickly because the
amount of data in the select is the number of rows in Table A multiplied by the
number of rows in Table B. If you have more than two tables this multiplies at
an exponential rate.
If we actually want a Cartesian join, then we should use the
ANSI cross join to tell others reading the script that we actually
wanted a Cartesian join. So why would we want one? One reason might be to produce all the combinations of 1, 2 and 3,
which could be used as part of a password or ID generation process:
numbers |
num |
1 |
2 |
3 |
mysql> SELECT CONCAT( CAST(t1.num AS CHAR),
-> CAST(t2.num AS CHAR)) AS combinations
-> FROM numbers t1, numbers t2;
+————–+
| combinations |
+————–+
| 11 |
| 21 |
| 31 |
| 12 |
| 22 |
| 32 |
| 13 |
| 23 |
| 33 |
+————–+
9 rows in set (0.00 sec)
Now you’ve got
every permutation of number combinations for two digits!
Knowing how to link tables is of great assistance in extracting
data from normalized databases, but it may not always be enough. There will be
times that no combination of joins will suffice to properly filter the data.
In those cases, it may be necessary to use temporary tables. We’ll be looking
at those in the next article.