Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MySQL

Posted Dec 3, 2010

Use Derived Tables in Your MySQL Queries to Improve Database Performance

By Rob Gravelle

There has been much debate as to whether derived tables, views, or temporary tables are actually faster in terms of database performance. The fact is that there is no simple answer, as different queries require different optimizations. Rob Gravelle presents and evaluates some uses for derived tables that highlight their value in many applications.

In version 4.1, MySQL added support for derived tables (also referred to as inline views). In very basic terms, a derived table is a virtual table that’s returned from a SELECT statement. This concept is similar to temporary tables, but using derived tables in your SELECT statements is much simpler because they don’t require all the steps that temporary tables do. There has been much debate as to whether derived tables, views, or temporary tables are actually faster in terms of performance. Supporters of derived tables cite that they allow to you to replace the code required for managing a temporary table with a single command, which can be faster than the series it replaces. The fact is that there is no simple answer, as different queries require different optimizations; each has its place in the database administrator’s arsenal. Today, I’m going to present and evaluate some uses for derived tables that highlight their value in many applications.

Some Terminology

A lot of people use the terms derived tables and subqueries interchangeably. Not surprising, considering that a subquery is defined as any query that appears within another SQL command. These can be further broken down into stand-alone sub-queries, which are those that you can run independent of the command that contains it, and correlated, which relate to fields from the containing command. The most common use for subqueries is filtering data in the WHERE clause of a SQL command or in the column list. Here’s a subquery that retrieves a list of all customers who placed orders in the specified year. The main query uses that list to find the reverse - those who didn't order in that year:

SELECT CompanyName
SELECT CompanyName FROM Customers WHERE CustomerID NOT IN (SELECT CustomerID FROM Orders WHERE YEAR(OrderDate) = 2003);

This statement fetches basic client information. A subquery performs a lookup on another table to retrieve the birth_country description:

SELECT itt.family_nme,
       itt.given_nme,
       itt.birth_dt,
       (SELECT english_desc 
        FROM   vcode_value 
        WHERE  code_value_id = itt.birth_country_cd) AS 'birth_country'
FROM identity_tombstone_id AS itt;

The above statement can also be said to contain a correlated subquery because it refers back to the outer main query.

Compare the previous two queries to the following one:

SELECT u.last_name, 
       a.state
FROM   User AS u,
       (SELECT user_id, state FROM Address) AS a
WHERE  u.id = a.user_id;

The above query selects directly from the derived table created by the subquery. Of course, in the simple example above, you could just as easily define the join without the derived table, but as we’ll see in the next section, there are other, more complex criteria, which call for the use of one or more derived tables.

When to Use Derived Tables

From my experience, the best candidates for derived tables are queries that incorporate some sort of grouping. In other words, those that include a GROUP BY clause. Consider the following tables:

employees table:

id

dept_id

gender

name

salary

emp_id_number

1

2

m

Jon Simpson

4500

1234

2

4

f

Barbara Breitenmoser

(NULL)

9999

3

3

f

Kirsten Ruegg

5600

3462

4

1

m

Ralph Teller

5100

6543

5

2

m

Peter Jonson

5200

9747

6

2

m

Allan Smithie

4900

6853

7

4

f

Mary Parker

5600

2223

8

3

f

Kirsten Ruegg

4400

2765

bonuses table:

bonus_id

emp_id

amount

award_date

1

3

500

2009-10-28

2

5

1000

2009-01-03

3

4

250

2009-04-22

4

2

900

2010-01-14

5

1

450

2009-01-03

6

5

1000

2009-10-28

7

8

300

2007-10-01

8

8

300.5

2009-08-12

9

2

250

2009-03-16

10

2

500

2009-07-30

11

6

350

2009-02-12

12

5

300

2009-01-15

Now we’ll write a SELECT statement to generate a report that tallies the total number of bonuses that each employee received last year:

SELECT e.name, 
       e.salary, 
       COUNT(b.bonus_id) AS TotalBonuses
FROM   employees e 
       LEFT OUTER JOIN bonuses b ON e.id = b.emp_id
WHERE  YEAR(b.award_date) = 2009
GROUP BY e.id;

Our 2009 bonus report looks quite good, except that employees who did not receive any bonuses aren't being included (no Mary Parker). That may be problematic, because it would be useful to be able to identify employees who may be underperforming.

name

salary

Total Bonuses

Jon Simpson

4500

1

Barbara Breitenmoser

(NULL)

2

Kirsten Ruegg

5600

1

Ralph Teller

5100

1

Peter Jonson

5200

3

Allan Smithie

4900

1

Kirsten Ruegg

4400

1

The reason that the above query doesn’t return all of the employees has to do with the way that the WHERE clause interacts with the GROUP BY clause. Using a derived table, we can reduce the bonuses table to a snapshot of itself containing only data from the year 2009.

SELECT e.name, 
       e.salary, 
       COUNT(b.bonus_id) AS 'Total Bonuses'
FROM employees e 
       LEFT OUTER JOIN 
(SELECT emp_id, bonus_id  FROM bonuses WHERE YEAR(award_date) = 2009) AS b
 ON e.id = b.emp_id
GROUP BY e.id;

The elusive Mary Parker is now visible.

name

salary

Total Bonuses

Jon Simpson

4500

1

Barbara Breitenmoser

(NULL)

2

Kirsten Ruegg

5600

1

Ralph Teller

5100

1

Peter Jonson

5200

3

Allan Smithie

4900

1

Mary Parker

5600

0

Kirsten Ruegg

4400

1

Another use for derived tables is to replace correlated subqueries with joins. The following statement is supposed to return a more detailed bonus report for the months of January, February, and March of last year.

select emp.name,
       emp.salary,
      (select sum(amount) 
       from   bonuses 
       where  emp_id = emp.id 
       and award_date between '2009-1-1' and '2009-1-31') jan_09_points,
      (select sum(amount) 
       from   bonuses 
       where  emp_id = emp.id     
       and award_date between '2009-2-1' and '2009-2-28') feb_09_points,
      (select sum(amount) 
       from   bonuses 
       where  emp_id = emp.id   
       and award_date between '2009-3-1' and '2009-3-31') mar_09_points 
from   employees emp
order by emp.name;

name

salary

jan_09_points

feb_09_points

mar_09_points

Allan Smithie

4900

(NULL)

350

(NULL)

Barbara Breitenmoser

(NULL)

(NULL)

(NULL)

250

Jon Simpson

4500

450

(NULL)

(NULL)

Kirsten Ruegg

5600

(NULL)

(NULL)

(NULL)

Kirsten Ruegg

4400

(NULL)

(NULL)

(NULL)

Mary Parker

5600

(NULL)

(NULL)

(NULL)

Peter Jonson

5200

1300

(NULL)

(NULL)

Ralph Teller

5100

(NULL)

(NULL)

(NULL)

Unfortunately, the writer of the query chose to use subqueries in the field list. That isn’t a great idea because the subqueries will be executed for every row in the employees table! For instance, if we have 1000 employees, then this statement will execute 3001 queries: one query on the employees table plus one thousand sum queries on each of the subqueries on the bonuses table). A better way to write this query is to use derived tables.

select emp_id,
       emp_name,
       emp_salary,       
       jan_09_points,  
       feb_09_points,         
       mar_09_points
from ( 
 select emp.id as emp_id,
        emp.name as emp_name,
        emp.salary as emp_salary,       
        sum(CASE WHEN month(award_date) = 1 
                 THEN b.amount 
            END) AS 'jan_09_points',       
        sum(CASE WHEN month(award_date) = 2 
                 THEN b.amount 
            END) AS 'feb_09_points',       
        sum(CASE WHEN month(award_date) = 3 
                 THEN b.amount 
            END) AS 'mar_09_points'
from employees emp 
LEFT OUTER JOIN 
(SELECT * FROM bonuses WHERE YEAR(award_date) = 2009) AS b
 ON emp.id = b.emp_id
 group by emp.id) as all_employees
order by emp_name;

name

salary

jan_09_points

feb_09_points

mar_09_points

Allan Smithie

4900

(NULL)

350

(NULL)

Barbara Breitenmoser

(NULL)

(NULL)

(NULL)

250

Jon Simpson

4500

450

(NULL)

(NULL)

Kirsten Ruegg

5600

(NULL)

(NULL)

(NULL)

Kirsten Ruegg

4400

(NULL)

(NULL)

(NULL)

Mary Parker

5600

(NULL)

(NULL)

(NULL)

Peter Jonson

5200

1300

(NULL)

(NULL)

Ralph Teller

5100

(NULL)

(NULL)

(NULL)

Ultimately, the decision of whether to use derived tables, views, or temporary tables in your queries and stored procedures rests with you. The best way to be confident in your choice is to experiment with several approaches and time them with different volumes. Derived tables just might turn out to be your best choice in many cases.

» See All Articles by Columnist Rob Gravelle



MySQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MySQL Forum
Topic By Replies Updated
MySQL in high availability and transction secure banking application klamor 1 August 28th, 10:24 AM
MySQL rollback UAL225 1 August 28th, 10:15 AM
Browsing a DB file that uses MySql finleytech 1 July 26th, 10:51 AM
php cookie won't pass variable to next page ITdevGirl 0 June 22nd, 12:13 PM


















Thanks for your registration, follow us on our social networks to keep up-to-date