It’s been said that one of the drawbacks to normalization to the third form (3NF) is more cumbersome data extraction due to the greater number of tables. These require careful linking via JOIN clauses. Improper table joining can easily result in erroneous results or even in the dreaded Cartesian Product. In today’s article, we’ll explore how table joins are achieved in MySQL.
A Tale of Two Join Styles
SQL was adopted as a standard by the American National Standards Institute (ANSI) in 1986 as SQL-86 and the International Organization for Standardization (ISO) in 1987.
Using ANSI-89 JOIN syntax, tables were joined on common fields using the equals symbol (=):
SELECT o.OrderID, od.ProductID FROM Orders AS o, dbo.OrderDetails AS od WHERE o.OrderDate >= '20091001' AND o.OrderID = od.ProductID;
That style was eventually eclipsed by a very different one that was introduced in the new SQL-92 standard. In general, developers found this style to be more readable than its predecessor by separating the joining criteria from the filter criteria. Here is the above query expressed using an SQL-92 join:
SELECT o.OrderID, od.ProductID FROM Orders AS o INNER JOIN dbo.OrderDetails AS od ON o.OrderID = od.ProductID WHERE o.OrderDate >= '20091001';
Although some of the ANSI 86 syntax, such as OUTER JOINs using “*=” and “=*” has been deprecated by some SQL vendors, some developers continue to use it in MySQL, where it is still supported. There is of course no harm in doing so, but for the purposes of this tutorial, I’ll be focussing on the SQL-92 standard, since it has been the recommended one since 1992.
With that being said, let’s go over the different types of table joins.
Inner Joins
The type of table join depicted in the example above is referred to as an inner join. The MySQL INNER JOIN clause matches rows in one table with rows in other tables and selects rows that contain columns from both tables. Hence, the above query would only return rows where an Order has an associated record in the OrderDetails table.
SELECT o.OrderID, od.ProductID FROM Orders AS o INNER JOIN dbo.OrderDetails AS od ON o.OrderID = od.ProductID WHERE o.OrderDate >= '20091001';
Selecting from Three Tables
Inner Join statements may be utilized to link any number of tables to the query, so long as there are common fields between the tables. Here is the syntax for joining three tables:
SELECT * FROM table1 INNER JOIN table2 ON table1.primaryKey=table2.table1Id INNER JOIN table3 ON table1.primaryKey=table3.table1Id
Applying the above syntax to our original query, let’s say that we only wanted to see records where there was an associated customer to the order. As long as there was a column shared between the two tables, we could hook it up to our query as follows:
SELECT o.OrderID, od.ProductID FROM Orders AS o INNER JOIN dbo.OrderDetails AS od ON o.OrderID = od.ProductID INNER JOIN dbo.Customers AS c ON o.CustID = c.Id WHERE o.OrderDate >= '20091001';
Outer Joins
If an inner join only selects rows that contain non-null linked values from both tables, an outer join selects rows regardless of whether there is a linked record in the joined table. There are three types of outer joins:
- Left: selects rows from the first (main) table regardless of whether or not there is a linked record in the joined table.
- Right: selects rows from the joined table regardless of whether or not it links to a record in the first (main) table.
- Full: selects rows from both the first (main) and joined table regardless of whether or not a record links the two tables. This join type is not supported by MySQL.
Left Join Example
The following query uses a Left Join (sometimes referred to as a left outer join) to fetch all of the records from the Orders table where the OrderDate is equal or greater than 20091001, regardless of whether or not there is an associated OrderDetail:
SELECT o.OrderID, od.ProductID FROM Orders AS o LEFT JOIN dbo.OrderDetails AS od ON o.OrderID = od.ProductID WHERE o.OrderDate >= '20091001';
In cases where there is no associated OrderDetail record, the ProductID will come up as NULL.
Right Join Example
Replacing the Left Join with a Right one will fetch all of the records from the OrderDetails table where the OrderDate is equal or greater than 2009-10-01. The result set will include unmatched records in the right table of the join. In this instance, that means that each OrderDetail is returned even if there is no associated Order. Note that right joins may also be referred to as a right outer join:
SELECT o.OrderID, od.ProductID FROM Orders AS o RIGHT JOIN dbo.OrderDetails AS od ON o.OrderID = od.ProductID WHERE o.OrderDate >= '20091001';
In cases where there is no associated Order record, the OrderID will show a NULL value.
Full Outer Join Examples
As mentioned above, the full outer join type is not supported by MySQL. Luckily, there are a couple of other ways to fetch the same data.
Figure 1: Syntax Error in MySQL Workbench
Two JOINs and a UNION
One method to simulate a full join is to take the UNION of two outer joins, for example:
SELECT o.OrderID, od.ProductID FROM Orders AS o LEFT JOIN dbo.OrderDetails AS od ON o.OrderID = od.ProductID WHERE o.OrderDate >= '20091001' UNION SELECT o.OrderID, od.ProductID FROM Orders AS o RIGHT JOIN dbo.OrderDetails AS od ON o.OrderID = od.ProductID WHERE o.OrderDate >= '20091001';
The limitation of this solution is that it does not handle duplicate records in either of the joined tables.
UNION ALL with an Exclusion Join
One way to eliminate duplicates is to use an exclusion join to exclude anything from the second result that is already included in the first:
SELECT o.OrderID, od.ProductID FROM Orders AS o LEFT JOIN dbo.OrderDetails AS od ON o.OrderID = od.ProductID WHERE o.OrderDate >= '20091001' UNION ALL SELECT o.OrderID, od.ProductID FROM Orders AS o RIGHT JOIN dbo.OrderDetails AS od ON o.OrderID = od.ProductID WHERE o.OrderDate >= '20091001'; AND o.OrderID IS NULL;
Notice the use of UNION ALL instead of plain UNION, which would eliminate both of the duplicated records.
Conclusion
To recap what we learned here today:
- Stick with the SQL-92 syntax.
- Inner Joins selects only rows that contain columns from both tables.
- Outer Joins include Left, Right, and Full.
- Outer Join result sets include unmatched records in the left, right, or both tables of a join, respectively.
- Full Outer Joins may be simulated in MySQL using UNION or UNION ALL with an Exclusion Join.
In a future article, we’ll examine how to avoid some of the more common mistakes with respect to table joins.