MySQL Subqueries

MYSQL 4.1 has been
released as a production version, and with all the new features it is likely a
new generation of developers will soon be using MySQL. This month I revisit an
old area of contention – subqueries. Until now, MySQL has not supported
subqueries, and this lack caused many to write off MySQL as not being a serious
DBMS. While the lack was certainly a problem, many developers do not know that
subqueries can often be rewritten as a join, sometimes giving a performance
benefit in the process. This month I look at subqueries, and how they can be
rewritten in a more optimal way.

Rewriting subqueries as joins

First, let’s create some
sample tables and data:


Click for full CREATE TABLE code

Rewriting subqueries as Inner Joins

Let’s look at a simple
subquery to return a list of all animals that have been assigned food:

mysql> SELECT name 
  FROM animal 
  WHERE id 
  IN (SELECT animal_id FROM animal_food);
+---------------------+
| name                |
+---------------------+
| Chacma Baboon       |
| Small Grey Mongoose |
+---------------------+
2 rows in set (0.00 sec)

Now, the same query as an
inner join.

mysql> SELECT DISTINCT name 
  FROM animal,animal_food 
  WHERE animal.id=animal_id;
+---------------------+
| name                |
+---------------------+
| Chacma Baboon       |
| Small Grey Mongoose |
+---------------------+
2 rows in set (0.00 sec)

Note the DISTINCT
keyword. Without it, we will get duplicate records, one animal returned for
each record in the animal_food table:

mysql> SELECT name FROM animal,animal_food WHERE animal.id=animal_id;
+---------------------+
| name                |
+---------------------+
| Chacma Baboon       |
| Chacma Baboon       |
| Small Grey Mongoose |
| Small Grey Mongoose |
+---------------------+
4 rows in set (0.01 sec)

Which query is better?
Some may argue that the former query is more readable, but I think that to
anyone vaguely competent in SQL they should both be clear. However, one is
definitely more efficient than the other is. Using EXPLAIN, let’s see which.
First, the join:



Click for full EXPLAIN SELECT DISTINCT code

MySQL would have to
examine 12 rows to return the result set (4×3, as you multiply the result – for
more on EXPLAIN see Optimizing
MySQL: Queries and Indexes
. Now the second query:




Click for full EXPLAIN SELECT code

This time MySQL has to
examine 16 rows, a marginal difference in such a small example, but this
difference can add up in larger datasets.

Rewriting subqueries as Outer Joins

A common kind of query is
finding all records that do not have an associated record in another table–in
this case, finding all animals that do not have an associated food. Here is how
you would do it with a subquery:

 
mysql> SELECT name FROM animal WHERE id NOT IN (SELECT animal_id FROM animal_food);
+-------------+
| name        |
+-------------+
| Cape Gerbil |
| Porcupine   |
+-------------+
2 rows in set (0.00 sec)

Writing this sort of
query without a subquery often taxes novice MySQL developers, but it is
actually quite easy, simply requiring a LEFT JOIN and an IS NULL clause:

mysql> SELECT name FROM animal LEFT JOIN  animal_food ON animal.id=animal_id WHERE animal_id IS NULL;
+-------------+
| name        |
+-------------+
| Cape Gerbil |
| Porcupine   |
+-------------+
2 rows in set (0.00 sec)

This time the former
query is certainly easier to understand, but does the extra complexity of the
second query deliver a performance gain?




Click for full EXPLAIN SELECT (2) code

I can hear some sighs of
relief – the subquery in this case is as efficient as the outer join. Outer
joins can often deliver performance gains though, so benchmark your queries.
Note that an alternative way to write the same subquery is as follows (this is
as efficient in this case)

mysql> SELECT name FROM animal 
  WHERE NOT EXISTS (SELECT animal_id FROM animal_food WHERE animal.id=animal_id);
+-------------+
| name        |
+-------------+
| Cape Gerbil |
| Porcupine   |
+-------------+
Ian Gilfillan
Ian Gilfillan
Ian Gilfillan lives in Cape Town, South Africa. He is the author of the book 'Mastering MySQL 4', published by Sybex, and has been working with MySQL since 1997. These days he develops mainly in PHP and MySQL, although confesses to starting out with BASIC and COBOL, way back when, and still has a soft spot for Perl. He developed South Africa's first online grocery store, and has developed and taught internet development and other technical courses for various institutions. He has majors in Programming and Information Systems, as well as English and Philosophy. For 5 years he was Lead Developer and IT Manager for Independent Online, South Africa's premier news portal. However, he has now 'retired' from fulltime work, and is hoping that his next book will be more in the style of William Blake and Allen Ginsberg.

Latest Articles