SHARE

Use Derived Tables in Your MySQL Queries to Improve Database Performance

Written By

Dec 3, 2010

6 minute read

There has been much debate as to whether derived tables, views, or temporary tables are actually faster in terms of database performance. The fact is that there is no simple answer, as different queries require different optimizations. Rob Gravelle presents and evaluates some uses for derived tables that highlight their value in many applications.

In
version 4.1, MySQL added support for derived tables (also referred to as inline views). In very basic terms, a derived table
is a virtual table that’s returned from a SELECT statement. This concept is
similar to temporary tables, but using derived tables in your
SELECT statements is much simpler because they don’t require all the steps that
temporary tables do. There has been much debate as to whether derived
tables, views, or temporary tables are actually faster in terms of performance.
Supporters of derived tables cite that they allow to you to replace the code
required for managing a temporary table with a single command, which can
be faster than the series it replaces. The
fact is that there is no simple answer, as different queries require different
optimizations; each has its place in the database administrator’s arsenal.
Today, I’m going to present and evaluate some uses for derived tables that
highlight their value in many applications.

Some Terminology

A lot of people use the terms derived tables
and subqueries interchangeably. Not surprising, considering that a subquery is
defined as any
query that appears within another SQL command. These can be further broken down
into stand-alone sub-queries,
which are those that you can run independent of the command that contains it,
and correlated, which relate to fields from the containing command. The
most common use for subqueries is filtering data in the WHERE clause of a SQL command
or in the column list. Here’s a subquery that retrieves a list of all
customers who placed orders in the specified year. The main query uses that
list to find the reverse – those who didn’t order in that year:

SELECT CompanyName
SELECT CompanyName
FROM   Customers
WHERE  CustomerID NOT IN (SELECT CustomerID
                          FROM   Orders
                          WHERE  YEAR(OrderDate) = 2003);

This
statement fetches basic client information. A subquery performs a lookup on
another table to retrieve the birth_country description:

SELECT itt.family_nme,
       itt.given_nme,
       itt.birth_dt,
       (SELECT english_desc 
        FROM   vcode_value 
        WHERE  code_value_id = itt.birth_country_cd) AS 'birth_country'
FROM identity_tombstone_id AS itt;

The above statement can also be
said to contain a correlated subquery because it refers back to the
outer main query.

Compare the previous two queries to
the following one:

SELECT u.last_name, 
       a.state
FROM   User AS u,
       (SELECT user_id, state FROM Address) AS a
WHERE  u.id = a.user_id;

The above query
selects directly from the derived table created by the subquery. Of
course, in the simple example above, you could just as easily define the join
without the derived table, but as we’ll see in the next section, there are
other, more complex criteria, which call for the use of one or more derived
tables.

When to Use Derived Tables

From my
experience, the best candidates for derived tables are queries that
incorporate some sort of grouping. In other words, those that include a GROUP
BY clause. Consider the following tables:

employees
table:

id	dept_id	gender	name	salary	emp_id_number
1	2	m	Jon Simpson	4500	1234
2	4	f	Barbara Breitenmoser	(NULL)	9999
3	3	f	Kirsten Ruegg	5600	3462
4	1	m	Ralph Teller	5100	6543
5	2	m	Peter Jonson	5200	9747
6	2	m	Allan Smithie	4900	6853
7	4	f	Mary Parker	5600	2223
8	3	f	Kirsten Ruegg	4400	2765

bonuses
table:

bonus_id	emp_id	amount	award_date
1	3	500	2009-10-28
2	5	1000	2009-01-03
3	4	250	2009-04-22
4	2	900	2010-01-14
5	1	450	2009-01-03
6	5	1000	2009-10-28
7	8	300	2007-10-01
8	8	300.5	2009-08-12
9	2	250	2009-03-16
10	2	500	2009-07-30
11	6	350	2009-02-12
12	5	300	2009-01-15

Now we’ll write a
SELECT statement to generate a report that tallies the total number of bonuses
that each employee received last year:

SELECT e.name, 
       e.salary, 
       COUNT(b.bonus_id) AS TotalBonuses
FROM   employees e 
       LEFT OUTER JOIN bonuses b ON e.id = b.emp_id
WHERE  YEAR(b.award_date) = 2009
GROUP BY e.id;

Our 2009 bonus
report looks quite good, except that employees who did not receive any bonuses
aren’t being included (no Mary
Parker). That may be
problematic, because it would be useful to be able to identify employees who
may be underperforming.

name	salary	Total Bonuses
Jon Simpson	4500	1
Barbara Breitenmoser	(NULL)	2
Kirsten Ruegg	5600	1
Ralph Teller	5100	1
Peter Jonson	5200	3
Allan Smithie	4900	1
Kirsten Ruegg	4400	1

The reason that the
above query doesn’t return all of the employees has to do with the way that the
WHERE clause interacts with the GROUP BY clause. Using a derived table, we can
reduce the bonuses table to a snapshot of itself containing only data from the
year 2009.

SELECT e.name, 
       e.salary, 
       COUNT(b.bonus_id) AS 'Total Bonuses'
FROM employees e 
       LEFT OUTER JOIN 
(SELECT emp_id, bonus_id  FROM bonuses WHERE YEAR(award_date) = 2009) AS b
 ON e.id = b.emp_id
GROUP BY e.id;

The
elusive Mary Parker is now visible.

name	salary	Total Bonuses
Jon Simpson	4500	1
Barbara Breitenmoser	(NULL)	2
Kirsten Ruegg	5600	1
Ralph Teller	5100	1
Peter Jonson	5200	3
Allan Smithie	4900	1
Mary Parker	5600	0
Kirsten Ruegg	4400	1

Another use for derived
tables is to replace correlated subqueries with joins. The
following statement is supposed to return a more detailed bonus report for the
months of January, February, and March of last year.

select emp.name,
       emp.salary,
      (select sum(amount) 
       from   bonuses 
       where  emp_id = emp.id 
       and award_date between '2009-1-1' and '2009-1-31') jan_09_points,
      (select sum(amount) 
       from   bonuses 
       where  emp_id = emp.id     
       and award_date between '2009-2-1' and '2009-2-28') feb_09_points,
      (select sum(amount) 
       from   bonuses 
       where  emp_id = emp.id   
       and award_date between '2009-3-1' and '2009-3-31') mar_09_points 
from   employees emp
order by emp.name;

name	salary	jan_09_points	feb_09_points	mar_09_points
Allan Smithie	4900	(NULL)	350	(NULL)
Barbara Breitenmoser	(NULL)	(NULL)	(NULL)	250
Jon Simpson	4500	450	(NULL)	(NULL)
Kirsten Ruegg	5600	(NULL)	(NULL)	(NULL)
Kirsten Ruegg	4400	(NULL)	(NULL)	(NULL)
Mary Parker	5600	(NULL)	(NULL)	(NULL)
Peter Jonson	5200	1300	(NULL)	(NULL)
Ralph Teller	5100	(NULL)	(NULL)	(NULL)

Unfortunately, the writer of the query chose to use subqueries in the field
list. That isn’t a great idea because the subqueries will be executed for
every row in the employees table! For instance, if we have 1000 employees,
then this statement will execute 3001 queries: one query on the employees table
plus one thousand sum queries on each of the subqueries on the bonuses
table). A better way to write this query is to use derived tables.

select emp_id,
       emp_name,
       emp_salary,       
       jan_09_points,  
       feb_09_points,         
       mar_09_points
from ( 
 select emp.id as emp_id,
        emp.name as emp_name,
        emp.salary as emp_salary,       
        sum(CASE WHEN month(award_date) = 1 
                 THEN b.amount 
            END) AS 'jan_09_points',       
        sum(CASE WHEN month(award_date) = 2 
                 THEN b.amount 
            END) AS 'feb_09_points',       
        sum(CASE WHEN month(award_date) = 3 
                 THEN b.amount 
            END) AS 'mar_09_points'
from employees emp 
LEFT OUTER JOIN 
(SELECT * FROM bonuses WHERE YEAR(award_date) = 2009) AS b
 ON emp.id = b.emp_id
 group by emp.id) as all_employees
order by emp_name;

name	salary	jan_09_points	feb_09_points	mar_09_points
Allan Smithie	4900	(NULL)	350	(NULL)
Barbara Breitenmoser	(NULL)	(NULL)	(NULL)	250
Jon Simpson	4500	450	(NULL)	(NULL)
Kirsten Ruegg	5600	(NULL)	(NULL)	(NULL)
Kirsten Ruegg	4400	(NULL)	(NULL)	(NULL)
Mary Parker	5600	(NULL)	(NULL)	(NULL)
Peter Jonson	5200	1300	(NULL)	(NULL)
Ralph Teller	5100	(NULL)	(NULL)	(NULL)

Ultimately, the decision of
whether to use derived tables, views, or temporary tables in your queries and
stored procedures rests with you. The best way to be confident in your choice
is to experiment with several approaches and time them with different volumes.
Derived tables just might turn out to be your best choice in many cases.

»

See All Articles by Columnist
Rob Gravelle

Robert Gravelle

Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Use Derived Tables in Your MySQL Queries to Improve Database Performance

Some Terminology

When to Use Derived Tables

Robert Gravelle

Company

Categories