Query Performance Optimization in MySQL

Experts know how to write performance efficient queries. Although experience ripens wisdom, there are certain things that one must understand at least to begin with. For example, you must understand key considerations of query design; how a query performs internally, where it fails, optimization patterns, etc. In this article, I’ll provide a few optimization points to ponder upon while designing a query in MySQL.

Why Are Some Queries Slow?

A common problem with SQL queries is that more data is being retrieved than is actually needed. Of course, there are queries that sift through a lot of data and we cannot do much about them, but they are not common. In most cases it is bad query design that leads to poor query performance. After every query design you must introspect on couple of aspects like what could happen after the query is fired:

Will the SQL query access too many columns or rows?
Will the MySQL server analyze too many rows to retrieve the desired result?

There are queries that make MySQL server analyze on too much data but throws them as it sifts. This is an extra work for the server in terms of many aspects such as network overhead, too much memory consumption or too much CPU resource usage on the server. The consequence is slow performance.

There are situation where you might not be able to help much during its design, but there is a situation where if you are careful and estimate the consequence and introspect, then a bad query can at least be made good if not better.

Typical Mistakes and Their Solutions

There are quite a few common mistakes often made while writing a query. Here are a few of them. You can find a few more thinking on the same line. Here’s reasons for slow query performance with possible solutions.

Too many rows

The mistake is often made of writing a query that retrieves data and assume that MySQL will provide result on demand while overlooking the amount of processing required to return the full result set. Suppose, a SELECT statement is fired to fetch 100 products details for an ecommerce site when only 10 of them actually need to be shown first. You might think that MySQL fetches 10 rows only and stop executing the query. But, no. What MySQL does is generate the complete result set and feeds the client. The client library receives the complete set and discards most of it and retains only 10 of which it seeks. This clearly wastes a lot of resource.

However, in such a situation you can provide a solution by using the LIMIT clause with the query.

SELECT
      col1, col2,...
FROM
      table_name
LIMIT
      [offset,] count;

The LIMIT clause accepts one or two parameters. The first one specifies the offset, and the second one specifies count. If only one parameter is specified it denotes number of rows from the beginning of the result set.

For example, to select 10 rows from the table, you can write:

SELECT
      e.emp_name, e.phone, e.email
FROM 
      employee e
LIMIT 10;

And for selecting the next 10 rows, starting from 11^th record, you can write:

SELECT
      e.emp_name, e.phone, e.email
FROM
      employee e
LIMIT 10, 10;

Too many Columns

Always look at the query: SELECT * with suspicion. This query returns all columns and you probably need only some of them. The greatest disadvantage of retrieving all columns is that it prevents optimization by hindering the use of indexes, demands too much I/O, memory and CPU resources from the server.

Understand that such a universal query retrieving all columns can be wasteful. Some say they are useful because it lets developer use the same bit of code in more than one place. That’s fine if the cost involved is limited within consideration. Sometime caching retrieved data helps in this context. But be cautious though, leveraging performance is a sleek job and such luxury may not have a place for performance.

The rule of thumb is to avoid such universal queries or keep a number of columns fetched to minimal as possible.

Too much data analysis

Queries return desired result that’s fine but sometimes these queries are written in such a way that while processing it requires examining too much data prior to generating results. Therefore, in MySQL you must measure according to the following cost metrics:

Execution time
Rows examined
Columns examined

You can get a rough estimation of query cost from these metrices. These reflects the quantity of data access by MySQL internally to process the query and how fast the query runs. Since theses metrices are logged in the slow query log, it is a good idea to investigate and find queries that analyze too much data to return the result. MySQL database registers all queries that exceeds a given amount of execution time in slow query log. This an ideal place to look for slow queries and find out how often they are slow.

A slow query log is typically located at /var/log/mysql/mysql-slow.log

Note that, one might have to set and enable logging slow queries in mysqld.cnf configuration file as follows.

#slow_query_log = 1
#slow_query_log_file = /var/log/mysql/mysql-slow.log
#long_query_time = 2

Prior to and with MySQL 5 there were serious limitation especially lacking support for fine-grained logging. Only respite was using patches that enabled logging. However, the feature has been part of the MySQL 5.1 and later servers as part of its core feature.

Queries that take too much time in execution does not necessarily mean they are bad queries. The slow query log simply provides the opportunity to examine the query performance and improve it as possible.

Restructuring queries

As you have opportunity to restructure problematic queries your primary aim should be to find an alternative solution to achieve the effect we want. You can transform the query into its equivalent form keeping in mind of the internal effect in the MySQL server while processing.

One decision in query design is to whether we should favor one complex query in place of several simple ones or vice versa. The conventional approach of database design is to do as many works as possible with fewer queries. The reason is that one big/complex query is more cost effective in terms of establishing database connection. The advantage of cost reduction in favor of complex query is network usage, query processing/optimization and resource utilization. But this traditional approach does not sit well with MySQL. MySQL is designed to handle database connection and disconnection quickly. Therefore, establishing connection, firing many simpler queries and closing connection seems more efficient. Retrieving data through more than one simple query in place of one big complex one is more effective. Note that same idea may not be applied with other databases.

Conclusion

These are a few quick tips to query optimization. Understand that, knowing SQL syntaxes, able to create a query that retrieves the desired result is not enough if one aims for query performance. Understanding the happening underneath the seemingly simple looking queries is vital in writing one that not only retrieve what is desired but imbue the art of optimization right from where it all starts. The behind the scene happening of query processing gives an important clue to understand query performance and this knowledge is a must before one foray into the realm of query optimization.