There has been much debate as to whether derived tables, views, or temporary tables are actually faster in terms of database performance. The fact is that there is no simple answer, as different queries require different optimizations. Rob Gravelle presents and evaluates some uses for derived tables that highlight their value in many applications.
In
version 4.1, MySQL added support for derived tables (also referred to as inline views). In very basic terms, a derived table
is a virtual table that’s returned from a SELECT statement. This concept is
similar to temporary tables, but using derived tables in your
SELECT statements is much simpler because they don’t require all the steps that
temporary tables do. There has been much debate as to whether derived
tables, views, or temporary tables are actually faster in terms of performance.
Supporters of derived tables cite that they allow to you to replace the code
required for managing a temporary table with a single command, which can
be faster than the series it replaces. The
fact is that there is no simple answer, as different queries require different
optimizations; each has its place in the database administrator’s arsenal.
Today, I’m going to present and evaluate some uses for derived tables that
highlight their value in many applications.
Some Terminology
A lot of people use the terms derived tables
and subqueries interchangeably. Not surprising, considering that a subquery is
defined as any
query that appears within another SQL command. These can be further broken down
into stand-alone sub-queries,
which are those that you can run independent of the command that contains it,
and correlated, which relate to fields from the containing command. The
most common use for subqueries is filtering data in the WHERE clause of a SQL command
or in the column list. Here’s a subquery that retrieves a list of all
customers who placed orders in the specified year. The main query uses that
list to find the reverse – those who didn’t order in that year:
SELECT CompanyName
SELECT CompanyName FROM Customers WHERE CustomerID NOT IN (SELECT CustomerID FROM Orders WHERE YEAR(OrderDate) = 2003);
This
statement fetches basic client information. A subquery performs a lookup on
another table to retrieve the birth_country description:
SELECT itt.family_nme, itt.given_nme, itt.birth_dt, (SELECT english_desc FROM vcode_value WHERE code_value_id = itt.birth_country_cd) AS 'birth_country' FROM identity_tombstone_id AS itt;
The above statement can also be
said to contain a correlated subquery because it refers back to the
outer main query.
Compare the previous two queries to
the following one:
SELECT u.last_name, a.state FROM User AS u, (SELECT user_id, state FROM Address) AS a WHERE u.id = a.user_id;
The above query
selects directly from the derived table created by the subquery. Of
course, in the simple example above, you could just as easily define the join
without the derived table, but as we’ll see in the next section, there are
other, more complex criteria, which call for the use of one or more derived
tables.
When to Use Derived Tables
From my
experience, the best candidates for derived tables are queries that
incorporate some sort of grouping. In other words, those that include a GROUP
BY clause. Consider the following tables:
employees
table:
id |
dept_id |
gender |
name |
salary |
emp_id_number |
1 |
2 |
m |
Jon |
4500 |
1234 |
2 |
4 |
f |
Barbara |
(NULL) |
9999 |
3 |
3 |
f |
Kirsten |
5600 |
3462 |
4 |
1 |
m |
Ralph |
5100 |
6543 |
5 |
2 |
m |
Peter |
5200 |
9747 |
6 |
2 |
m |
Allan |
4900 |
6853 |
7 |
4 |
f |
Mary |
5600 |
2223 |
8 |
3 |
f |
Kirsten |
4400 |
2765 |
bonuses
table:
bonus_id |
emp_id |
amount |
award_date |
1 |
3 |
500 |
2009-10-28 |
2 |
5 |
1000 |
2009-01-03 |
3 |
4 |
250 |
2009-04-22 |
4 |
2 |
900 |
2010-01-14 |
5 |
1 |
450 |
2009-01-03 |
6 |
5 |
1000 |
2009-10-28 |
7 |
8 |
300 |
2007-10-01 |
8 |
8 |
300.5 |
2009-08-12 |
9 |
2 |
250 |
2009-03-16 |
10 |
2 |
500 |
2009-07-30 |
11 |
6 |
350 |
2009-02-12 |
12 |
5 |
300 |
2009-01-15 |
Now we’ll write a
SELECT statement to generate a report that tallies the total number of bonuses
that each employee received last year:
SELECT e.name, e.salary, COUNT(b.bonus_id) AS TotalBonuses FROM employees e LEFT OUTER JOIN bonuses b ON e.id = b.emp_id WHERE YEAR(b.award_date) = 2009 GROUP BY e.id;
Our 2009 bonus
report looks quite good, except that employees who did not receive any bonuses
aren’t being included (no Mary
Parker). That may be
problematic, because it would be useful to be able to identify employees who
may be underperforming.
name |
salary |
Total Bonuses |
Jon |
4500 |
1 |
Barbara |
(NULL) |
2 |
Kirsten |
5600 |
1 |
Ralph |
5100 |
1 |
Peter |
5200 |
3 |
Allan |
4900 |
1 |
Kirsten |
4400 |
1 |
The reason that the
above query doesn’t return all of the employees has to do with the way that the
WHERE clause interacts with the GROUP BY clause. Using a derived table, we can
reduce the bonuses table to a snapshot of itself containing only data from the
year 2009.
SELECT e.name, e.salary, COUNT(b.bonus_id) AS 'Total Bonuses' FROM employees e LEFT OUTER JOIN (SELECT emp_id, bonus_id FROM bonuses WHERE YEAR(award_date) = 2009) AS b ON e.id = b.emp_id GROUP BY e.id;
The
elusive Mary Parker is now visible.
name |
salary |
Total Bonuses |
Jon |
4500 |
1 |
Barbara |
(NULL) |
2 |
Kirsten |
5600 |
1 |
Ralph |
5100 |
1 |
Peter |
5200 |
3 |
Allan |
4900 |
1 |
Mary Parker |
5600 |
0 |
Kirsten |
4400 |
1 |
Another use for derived
tables is to replace correlated subqueries with joins. The
following statement is supposed to return a more detailed bonus report for the
months of January, February, and March of last year.
select emp.name, emp.salary, (select sum(amount) from bonuses where emp_id = emp.id and award_date between '2009-1-1' and '2009-1-31') jan_09_points, (select sum(amount) from bonuses where emp_id = emp.id and award_date between '2009-2-1' and '2009-2-28') feb_09_points, (select sum(amount) from bonuses where emp_id = emp.id and award_date between '2009-3-1' and '2009-3-31') mar_09_points from employees emp order by emp.name;
name |
salary |
jan_09_points |
feb_09_points |
mar_09_points |
Allan |
4900 |
(NULL) |
350 |
(NULL) |
Barbara |
(NULL) |
(NULL) |
(NULL) |
250 |
Jon |
4500 |
450 |
(NULL) |
(NULL) |
Kirsten |
5600 |
(NULL) |
(NULL) |
(NULL) |
Kirsten |
4400 |
(NULL) |
(NULL) |
(NULL) |
Mary |
5600 |
(NULL) |
(NULL) |
(NULL) |
Peter |
5200 |
1300 |
(NULL) |
(NULL) |
Ralph |
5100 |
(NULL) |
(NULL) |
(NULL) |
Unfortunately, the writer of the query chose to use subqueries in the field
list. That isn’t a great idea because the subqueries will be executed for
every row in the employees table! For instance, if we have 1000 employees,
then this statement will execute 3001 queries: one query on the employees table
plus one thousand sum queries on each of the subqueries on the bonuses
table). A better way to write this query is to use derived tables.
select emp_id, emp_name, emp_salary, jan_09_points, feb_09_points, mar_09_points from ( select emp.id as emp_id, emp.name as emp_name, emp.salary as emp_salary, sum(CASE WHEN month(award_date) = 1 THEN b.amount END) AS 'jan_09_points', sum(CASE WHEN month(award_date) = 2 THEN b.amount END) AS 'feb_09_points', sum(CASE WHEN month(award_date) = 3 THEN b.amount END) AS 'mar_09_points' from employees emp LEFT OUTER JOIN (SELECT * FROM bonuses WHERE YEAR(award_date) = 2009) AS b ON emp.id = b.emp_id group by emp.id) as all_employees order by emp_name;
name |
salary |
jan_09_points |
feb_09_points |
mar_09_points |
Allan |
4900 |
(NULL) |
350 |
(NULL) |
Barbara |
(NULL) |
(NULL) |
(NULL) |
250 |
Jon |
4500 |
450 |
(NULL) |
(NULL) |
Kirsten |
5600 |
(NULL) |
(NULL) |
(NULL) |
Kirsten |
4400 |
(NULL) |
(NULL) |
(NULL) |
Mary |
5600 |
(NULL) |
(NULL) |
(NULL) |
Peter |
5200 |
1300 |
(NULL) |
(NULL) |
Ralph |
5100 |
(NULL) |
(NULL) |
(NULL) |
Ultimately, the decision of
whether to use derived tables, views, or temporary tables in your queries and
stored procedures rests with you. The best way to be confident in your choice
is to experiment with several approaches and time them with different volumes.
Derived tables just might turn out to be your best choice in many cases.