Unless you’re writing application code that can deal with multiple result sets, it’s usually preferable to merge several similar queries together so that their results may be viewed as one data set. MySQL supports a way to accomplish this task using pure SQL thanks to the UNION and UNION ALL operators. This tutorial offers a brief overview on how to use both and provides a few examples to get you started.
What the Union and Union All Operators Do
Both the UNION and UNION ALL operators achieve the common goal of combining the results of two or more SELECT statements. As such, each query within the UNION must select the same number of fields with compatible data types - i.e. that may be converted using implicit conversion. The one, albeit substantial, difference between the two is that UNION removes duplicate rows between the various SELECT statements whereas UNION ALL returns all rows from the query even if an identical row exists in more than one of the SELECT statements.
The UNION and UNION ALL operators are simple enough to use; just insert them between each SELECT statement that you want to combine:
SELECT expression1, expression2, ... expression_n
FROM tables
WHERE conditions
UNION (ALL)
SELECT expression1, expression2, ... expression_n
FROM tables
WHERE conditions
[UNION (ALL)
SELECT expression1, expression2, ... expression_n
FROM tables
WHERE conditions]
etc...;
Selecting from Unrelated Tables
Using SELECT DISTINCT or careful table joining is generally more efficient than UNIONs, but when tables are unrelated, that is to say that there are no Primary and Foreign Keys linking them, UNION can be employed to generate a master list that would otherwise require multiple queries.
For example, say you wanted to produce a list of all of the people associated with your business and had three separate tables for supplier, salespeople, and customer details. Rather than work with three individual result sets, you could combine them using UNION:
SELECT id, name, email
FROM suppliers
UNION
SELECT rep_id, name, email
FROM salespeople
UNION
SELECT id, name, email_address
FROM customers;
Here is what the first several results might look like. Notice the duplication of Tim Akeman, who exists in more than one table:
ID
|
name
|
email
|
13
|
Jack N
|
jack@acme.com
|
17
|
Tim Akeman
|
akeman_tim@acme.com
|
18
|
Reb Beech
|
thebeech@rogers.com
|
21
|
Nancy Willis
|
nwillis98@gmail.com
|
22
|
R Cook
|
cook.r@yahoo.com
|
17
|
Tim Akeman
|
akeman_tim@acme.com
|
26
|
N King
|
thekingisin@bell.ca
|
A useful bit of information to know about the UNION and UNION ALL operators is that if the field names do not match across each SELECT statement, the query will display the names from the first query in the results.
The above query doesn’t tell us anything about what type of contact each person is, but we can include that ourselves by hardcoding it in each query:
SELECT id, name, email, ‘supplier’ as contact_type
FROM suppliers
UNION
SELECT rep_id, name, email, ‘salesperson’ as contact_type
FROM salespeople
UNION
SELECT id, name, email_address, ‘customer’ as contact_type
FROM customers;
Now we can see that Tim Akeman is both a supplier and customer:
ID
|
post_name
|
post_title
|
contact_type
|
13
|
Jack N
|
jack@acme.com
|
supplier
|
17
|
Tim Akeman
|
akeman_tim@acme.com
|
supplier
|
18
|
Reb Beech
|
thebeech@rogers.com
|
salesperson
|
21
|
Nancy Willis
|
nwillis98@gmail.com
|
salesperson
|
22
|
R Cook
|
cook.r@yahoo.com
|
customer
|
17
|
Tim Akeman
|
akeman_tim@acme.com
|
customer
|
26
|
N King
|
thekingisin@bell.ca
|
customer
|
Using Group By and Order By Clauses
In working with data over time, you’ll find that creative solutions are often required. One such situation arose recently when I was asked to fetch a list of all the names contained in my client’s restaurants database. What first looked like a simple SELECT statement turned into a formidable task! The crux of the challenge was that several restaurants were actually chains, so they appeared numerous times in the main table. Using a SELECT DISTINCT was a no-go because the results contained the unique IDs.
The answer was a combination of the GROUP BY clause and UNION operator.
Restaurant chains could be identified by their 'restaurant_base_name' key in the postmeta table. The chain query included the GROUP BY clause so that only one of each was returned by the query. Omitting restaurants that did not possess a 'restaurant_base_name' from the other query yielded all of the stand-alone restaurants.
Finally, both queries were joined using UNION ALL and the full results sorted using the ORDER BY statement.
-- non-chain restaurants
SELECT ID,
post_name,
post_title
FROM posts
WHERE post_type='restaurant'
AND post_status='publish'
AND ID not in (SELECT post_id
FROM postmeta
WHERE meta_key='restaurant_base_name')
UNION ALL
-- chain restaurants
SELECT p.id,
p.post_name,
pm.meta_value as post_title
FROM posts as p,
postmeta as pm
WHERE p.id = pm.post_id
AND meta_key = 'restaurant_base_name'
AND post_status = 'publish'
GROUP BY post_title
ORDER BY post_title -- sorts all the results
In the following result set, you can see a few chains mixed in, including, Carluccio’s, McDonald’s, Nando’s, and Pret A Manger:
ID
|
post_name
|
post_title 
|
2525
|
aqua-kyoto
|
Aqua Kyoto
|
7479
|
babylon-restaurant
|
Babylon Restaurant
|
1747
|
hippodrome-circle-restaurant
|
Birmingham Hippodrome - Circle Restaurant
|
1415
|
carluccios-aberdeen
|
Carluccio's
|
6441
|
eds-easy-diner-soho
|
Ed's Easy Diner
|
3657
|
mcdonald
|
McDonald's
|
6570
|
nandos-ealingcommon
|
Nando's
|
2066
|
pret-bvictoriastreet
|
Pret A Manger
|
2547
|
rosa-lewis-bar-at-the-cavendish
|
Rosa Lewis Bar at the Cavendish
|
1704
|
the-akeman
|
The Akeman
|
1797
|
the-alford-arms
|
The Alford Arms
|
2594
|
2594-2
|
The Bay Fish & Chip Shop
|
1708
|
the-beech-house
|
The Beech House - Beaconsfield
|
1710
|
the-blue-boar
|
The Blue Boar
|
1714
|
the-crown-thistle
|
The Crown & Thistle
|
1716
|
the-kings-arms
|
The King's Arms
|
1718
|
the-old-post-office
|
The Old Post Office
|
Customizing the Overall Sorting
The overall sorting produced by the last ORDER BY statement is placing a lot of restaurants that should be near the top of the results at the bottom because of the leading “The” in their names.
We can change how the rows are sorted because the ORDER BY clause can accept expressions and function calls as well as field names. In the code below, the IF() function tests for a leading “The “ and removes it using SUBSTRING() if necessary.
The main result fields may be modified in the same way. Here, the CONCAT() function reinserts the “The” enclosed in parentheses to show that it is not included in the sorting. The expression is assigned the same name as the original field so that it may be referenced by the ORDER BY clause:
-- non-chain restaurants
SELECT ID,
post_name,
IF(SUBSTRING(post_title, 1, 4) = "The ",
CONCAT('(The) ', SUBSTRING(post_title, 5)),
post_title ) AS post_title
FROM posts
WHERE post_type='restaurant'
AND post_status='publish'
AND ID not in (SELECT post_id
FROM postmeta
WHERE meta_key='restaurant_base_name')
UNION ALL
SELECT p.id,
p.post_name,
pm.meta_value as post_title
FROM posts as p,
postmeta as pm
WHERE p.id = pm.post_id
AND meta_key = 'restaurant_base_name'
AND post_status = 'publish'
-- sorts all the results
ORDER BY IF(SUBSTRING(post_title, 1, 6) = "(The) ",
SUBSTRING(post_title, 7),
post_title)
Much better!
ID
|
post_name
|
post_title
|
1704
|
the-akeman
|
(The) Akeman
|
1797
|
the-alford-arms
|
(The) Alford Arms
|
2525
|
aqua-kyoto
|
Aqua Kyoto
|
7479
|
babylon-restaurant
|
Babylon Restaurant
|
2594
|
2594-2
|
(The) Bay Fish & Chip Shop
|
1708
|
the-beech-house
|
(The) Beech House - Beaconsfield
|
1747
|
hippodrome-circle-restaurant
|
Birmingham Hippodrome - Circle Restaurant
|
1710
|
the-blue-boar
|
(The) Blue Boar
|
1415
|
carluccios-aberdeen
|
Carluccio's
|
1714
|
the-crown-thistle
|
(The) Crown & Thistle
|
6441
|
eds-easy-diner-soho
|
Ed's Easy Diner
|
1716
|
the-kings-arms
|
(The) King's Arms
|
3657
|
mcdonald
|
McDonald's
|
6570
|
nandos-ealingcommon
|
Nando's
|
1718
|
the-old-post-office
|
(The) Old Post Office
|
2066
|
pret-bvictoriastreet
|
Pret A Manger
|
1720
|
the-red-lion
|
(The) Red Lion
|
2547
|
rosa-lewis-bar-at-the-cavendish
|
Rosa Lewis Bar at the Cavendish
|
Conclusion
When choosing between the UNION and UNION ALL operators, keep in mind that the distinct sorting operation that is performed by the UNION operator can be time consuming. For that reason, consider the UNION ALL operator your first choice unless you are getting duplicates back. In the previous example, we were able to engineer our query so that every row was unique, so that using UNION would be overkill.
See all articles by Rob Gravelle