Use Derived Tables in Your MySQL Queries to Improve Database Performance


There has been much debate as to whether derived tables, views, or temporary tables are actually faster in terms of database performance. The fact is that there is no simple answer, as different queries require different optimizations. Rob Gravelle presents and evaluates some uses for derived tables that highlight their value in many applications.

In
version 4.1, MySQL added support for derived tables (also referred to as inline views). In very basic terms, a derived table
is a virtual table that’s returned from a SELECT statement. This concept is
similar to temporary tables, but using derived tables in your
SELECT statements is much simpler because they don’t require all the steps that
temporary tables do. There has been much debate as to whether derived
tables, views, or temporary tables are actually faster in terms of performance.
Supporters of derived tables cite that they allow to you to replace the code
required for managing a temporary table with a single command, which can
be faster than the series it replaces. The
fact is that there is no simple answer, as different queries require different
optimizations; each has its place in the database administrator’s arsenal.
Today, I’m going to present and evaluate some uses for derived tables that
highlight their value in many applications.

Some Terminology

A lot of people use the terms derived tables
and subqueries interchangeably. Not surprising, considering that a subquery is
defined as any
query that appears within another SQL command. These can be further broken down
into stand-alone sub-queries,
which are those that you can run independent of the command that contains it,
and correlated, which relate to fields from the containing command. The
most common use for subqueries is filtering data in the WHERE clause of a SQL command
or in the column list. Here’s a subquery that retrieves a list of all
customers who placed orders in the specified year. The main query uses that
list to find the reverse – those who didn’t order in that year:

SELECT CompanyName
SELECT CompanyName FROM Customers WHERE CustomerID NOT IN (SELECT CustomerID FROM Orders WHERE YEAR(OrderDate) = 2003);

This
statement fetches basic client information. A subquery performs a lookup on
another table to retrieve the birth_country description:

SELECT itt.family_nme,
       itt.given_nme,
       itt.birth_dt,
       (SELECT english_desc 
        FROM   vcode_value 
        WHERE  code_value_id = itt.birth_country_cd) AS 'birth_country'
FROM identity_tombstone_id AS itt;

The above statement can also be
said to contain a correlated subquery because it refers back to the
outer main query.

Compare the previous two queries to
the following one:

SELECT u.last_name, 
       a.state
FROM   User AS u,
       (SELECT user_id, state FROM Address) AS a
WHERE  u.id = a.user_id;

The above query
selects directly from the derived table created by the subquery. Of
course, in the simple example above, you could just as easily define the join
without the derived table, but as we’ll see in the next section, there are
other, more complex criteria, which call for the use of one or more derived
tables.

When to Use Derived Tables

From my
experience, the best candidates for derived tables are queries that
incorporate some sort of grouping. In other words, those that include a GROUP
BY clause. Consider the following tables:

employees
table:

id

dept_id

gender

name

salary

emp_id_number

1

2

m

Jon
Simpson

4500

1234

2

4

f

Barbara
Breitenmoser

(NULL)

9999

3

3

f

Kirsten
Ruegg

5600

3462

4

1

m

Ralph
Teller

5100

6543

5

2

m

Peter
Jonson

5200

9747

6

2

m

Allan
Smithie

4900

6853

7

4

f

Mary
Parker

5600

2223

8

3

f

Kirsten
Ruegg

4400

2765

bonuses
table:

bonus_id

emp_id

amount

award_date

1

3

500

2009-10-28

2

5

1000

2009-01-03

3

4

250

2009-04-22

4

2

900

2010-01-14

5

1

450

2009-01-03

6

5

1000

2009-10-28

7

8

300

2007-10-01

8

8

300.5

2009-08-12

9

2

250

2009-03-16

10

2

500

2009-07-30

11

6

350

2009-02-12

12

5

300

2009-01-15

Now we’ll write a
SELECT statement to generate a report that tallies the total number of bonuses
that each employee received last year:

SELECT e.name, 
       e.salary, 
       COUNT(b.bonus_id) AS TotalBonuses
FROM   employees e 
       LEFT OUTER JOIN bonuses b ON e.id = b.emp_id
WHERE  YEAR(b.award_date) = 2009
GROUP BY e.id;

Our 2009 bonus
report looks quite good, except that employees who did not receive any bonuses
aren’t being included (no Mary
Parker). That may be
problematic, because it would be useful to be able to identify employees who
may be underperforming.

name

salary

Total Bonuses

Jon
Simpson

4500

1

Barbara
Breitenmoser

(NULL)

2

Kirsten
Ruegg

5600

1

Ralph
Teller

5100

1

Peter
Jonson

5200

3

Allan
Smithie

4900

1

Kirsten
Ruegg

4400

1

The reason that the
above query doesn’t return all of the employees has to do with the way that the
WHERE clause interacts with the GROUP BY clause. Using a derived table, we can
reduce the bonuses table to a snapshot of itself containing only data from the
year 2009.

SELECT e.name, 
       e.salary, 
       COUNT(b.bonus_id) AS 'Total Bonuses'
FROM employees e 
       LEFT OUTER JOIN 
(SELECT emp_id, bonus_id  FROM bonuses WHERE YEAR(award_date) = 2009) AS b
 ON e.id = b.emp_id
GROUP BY e.id;

The
elusive Mary Parker is now visible.

name

salary

Total Bonuses

Jon
Simpson

4500

1

Barbara
Breitenmoser

(NULL)

2

Kirsten
Ruegg

5600

1

Ralph
Teller

5100

1

Peter
Jonson

5200

3

Allan
Smithie

4900

1

Mary Parker

5600

0

Kirsten
Ruegg

4400

1

Another use for derived
tables
is to replace correlated subqueries with joins. The
following statement is supposed to return a more detailed bonus report for the
months of January, February, and March of last year.

select emp.name,
       emp.salary,
      (select sum(amount) 
       from   bonuses 
       where  emp_id = emp.id 
       and award_date between '2009-1-1' and '2009-1-31') jan_09_points,
      (select sum(amount) 
       from   bonuses 
       where  emp_id = emp.id     
       and award_date between '2009-2-1' and '2009-2-28') feb_09_points,
      (select sum(amount) 
       from   bonuses 
       where  emp_id = emp.id   
       and award_date between '2009-3-1' and '2009-3-31') mar_09_points 
from   employees emp
order by emp.name;

name

salary

jan_09_points

feb_09_points

mar_09_points

Allan
Smithie

4900

(NULL)

350

(NULL)

Barbara
Breitenmoser

(NULL)

(NULL)

(NULL)

250

Jon
Simpson

4500

450

(NULL)

(NULL)

Kirsten
Ruegg

5600

(NULL)

(NULL)

(NULL)

Kirsten
Ruegg

4400

(NULL)

(NULL)

(NULL)

Mary
Parker

5600

(NULL)

(NULL)

(NULL)

Peter
Jonson

5200

1300

(NULL)

(NULL)

Ralph
Teller

5100

(NULL)

(NULL)

(NULL)

Unfortunately, the writer of the query chose to use subqueries in the field
list. That isn’t a great idea because the subqueries will be executed for
every row in the employees table! For instance, if we have 1000 employees,
then this statement will execute 3001 queries: one query on the employees table
plus one thousand sum queries on each of the subqueries on the bonuses
table). A better way to write this query is to use derived tables.

select emp_id,
       emp_name,
       emp_salary,       
       jan_09_points,  
       feb_09_points,         
       mar_09_points
from ( 
 select emp.id as emp_id,
        emp.name as emp_name,
        emp.salary as emp_salary,       
        sum(CASE WHEN month(award_date) = 1 
                 THEN b.amount 
            END) AS 'jan_09_points',       
        sum(CASE WHEN month(award_date) = 2 
                 THEN b.amount 
            END) AS 'feb_09_points',       
        sum(CASE WHEN month(award_date) = 3 
                 THEN b.amount 
            END) AS 'mar_09_points'
from employees emp 
LEFT OUTER JOIN 
(SELECT * FROM bonuses WHERE YEAR(award_date) = 2009) AS b
 ON emp.id = b.emp_id
 group by emp.id) as all_employees
order by emp_name;

name

salary

jan_09_points

feb_09_points

mar_09_points

Allan
Smithie

4900

(NULL)

350

(NULL)

Barbara
Breitenmoser

(NULL)

(NULL)

(NULL)

250

Jon
Simpson

4500

450

(NULL)

(NULL)

Kirsten
Ruegg

5600

(NULL)

(NULL)

(NULL)

Kirsten
Ruegg

4400

(NULL)

(NULL)

(NULL)

Mary
Parker

5600

(NULL)

(NULL)

(NULL)

Peter
Jonson

5200

1300

(NULL)

(NULL)

Ralph
Teller

5100

(NULL)

(NULL)

(NULL)

Ultimately, the decision of
whether to use derived tables, views, or temporary tables in your queries and
stored procedures rests with you. The best way to be confident in your choice
is to experiment with several approaches and time them with different volumes.
Derived tables just might turn out to be your best choice in many cases.

»


See All Articles by Columnist

Rob Gravelle

Robert Gravelle
Robert Gravelle
Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles