MySQL’s Explain Extended
Explained
One nice feature
added to the EXPLAIN statement in MySQL 4.1 is the EXTENDED keyword which
provides you with some helpful additional information on query optimization. It
should be used together with
WARNINGS to get information about how query looks after transformation as well
as what other notes the optimizer may wish to tell us. This article will demonstrate how to use
Explain Extended with Show Warnings to fine tune your Select statements as well
as how to rev up performance even more using optimizer hints and
What the Extended Keyword Adds
While
it may look like a regular Explain statement, behind the scenes, MySQL engineers
the SQL statement into its optimized form.
Using Show Warnings afterwards prints out the optimized Select
statement. Examining it can give us a
better idea of how MySQL is interpretting our SQL. Here’s an example:
The
following query could be used to fin
many employees show up in the bonuses table. That is to say, how many employees
have received bonuses:
SELECT COUNT(*)
FROM employees
WHERE id IN (SELECT
emp_id FROM bonuses);
Adding the EXPLAIN EXTENDED prefix to the above statement
will execute the statement behind the scenes so that the compiler optimizations
can be analyzed:
EXPLAIN EXTENDED
SELECT COUNT(*)
FROM employees
WHERE id IN (SELECT
emp_id FROM bonuses);
We can study the following table to glean valuable insight
into query execution:
id |
select_type |
table |
type |
possible_keys |
key |
key_len |
ref |
rows |
filtered |
Extra |
1 |
PRIMARY |
emp |
index |
(NULL) |
PRIMARY |
4 |
(NULL) |
8 |
100 |
Using where; Using index |
2 |
DEPENDENT SUBQUERY |
bonuses |
index_subquery |
emp_id |
emp_id |
4 |
func |
1 |
100 |
Using index |
This
table is very much like the one produced by the regular EXPLAIN except for the
added filtered column in the second
last position. The
filtered column, added
in MySQL 5.1.12, indicates an estimated percentage of table rows that will be
filtered by the table condition. Hence, the rows column shows the estimated
number of rows examined and rows × filtered / 100 calculates the number
of rows that will be joined with previous tables.
Applying
EXPLAIN EXTENDED to our query gives us the opportunity
to run the Show
Warnings statement afterwards to see final optimized query:
select count(0) AS `COUNT(*)`
from `operations`.`employees` `emp`
where
<in_optimizer>(`operations`.`emp`.`id`,
<exists>(<index_lookup>(<cache>(`operations`.`emp`.`id`)
in bonuses on emp_id)))
It shows that the query engine is using <in_optimizer>
and converting the Where clause into an <exists> check using an
<index_lookup >. There is some caching being done as well.
The EXISTS clause
provides a simple way to find intersection between tables. However, the way that EXISTS executes tends
to take more time than other methods because each
row from the outer query result is compared against the inner query, meaning
that the inner query is executed as many times as there are rows in the outer
query result.
A
reasonable goal then to improve performance would be to get rid of the
EXISTS. While the EXPLAIN EXTENDED does
not tell us how to write our query, a good place to start is to remove the nested
subquery in the WHERE clause:
EXPLAIN EXTENDED
SELECT COUNT(DISTINCT emp.id)
FROM employees emp
INNER JOIN bonuses b on emp.id = b.emp_id;
After
executing the revised statement,
WARNINGS produces a very different and simpler output:
select count(distinct `operations`.`emp`.`id`)
AS `COUNT(DISTINCT emp.id)`
from `operations`.`employees` `emp`
join `operations`.`bonuses` `b`
where (`operations`.`b`.`emp_id` = `operations`.`emp`.`id`)
Using Optimizer Hints
Optimizer hints are instructions that you can embed in your
SQL that instruct or suggest to the optimizer how
you would like the SQL to be executed.
They should be used sparingly and with caution! Hints limit the choices
the optimizer has available, so if data in the tables change or new indexes are
created, MySQL may be unable to adapt because of your hints. The best time to
use them is in situations where you discover that the optimizer has made a poor
choice.
The following able lists
the commonly used optimizer hints:
Measuring Query Execution with SHOW
STATUS
Whenever we execute
a SQL statement, the elapsed time taken to execute the statement is reported:
mysql>SELECT * FROM ORDERS;
Query OK, 0 rows affected (8.25 sec)
While the elapsed
time is a good general measurement of SQL performance, it can be unreliable.
For instance, it can vary depending on how many other jobs are running on the server.
Cached data on the operating system can also affect execution time due to its
effect on the number of required physical I/O reads. For these reasons, it is a good idea to
obtain performance metrics using the
STATUS statement.
The level of detail
will vary depending on the storage engine. InnoDB currently offers the most
comprehensive selection of statistics, with over 130 variables.
Displaying all of them would create a list which scrolls for many pages, so
it’s best to filter the list by adding a LIKE clause:
mysql>
STATUS LIKE ‘threads%’;
Variable_name |
Value |
Threads_cached |
0 |
Threads_connected |
1 |
Threads_created |
1 |
Threads_running |
1 |
Each of the
statistics has a special meaning in MySQL. Here are some of the more important ones:
- Innodb_buffer_pool_read_requests: Number of
requests from the InnoDB buffer pool - Innodb_data_reads: Number of blocks from disk that InnoDB had
to read to execute the query. - Innodb_rows_read: Number of rows read by InnoDB to satisfy
the query. - Last_query_cost: Optimizer’s "cost" estimate for
the last SQL executed. - Sort_rows: Number of rows that had to be sorted.
- Sort_merge_passes: Number of disk sort "merge runs"
that had to be performed.
Innodb_buffer_pool_read_requests is probably the most useful of the lot for
determining the outcome of our SQL tuning efforts. When combined with Explain Extended, Show Warnings, and optimizer
hints, you should be able to see dramatic performance improvements in many
Select queries.