Five Query Optimizations in MySQL

Introduction

Query
optimization is the often overlooked part of applications.  Development
schedules being what they are, getting the right results and getting the
application working are the main priorities.  So thoroughly testing, and
benchmarking queries is often left as an afterthought.   

With our
short introduction to query optimization in MySQL, we hope to encourage at
least some attention to these issues up front.  We’ll also help you
identify some of the more common optimizations you may run across.

1. Optimizing Data Types

There
are a lot of different data types to choose from in MySQL, and sometimes it can
be overwhelming.  If you’re a DBA you’re looking at data types strictly in
terms of optimizations, and that is your criteria for choosing.  But
developers may be looking at them in terms of elegant solutions, speed of
coding, or simply the most obvious choice.  So as a DBA you are likely to
be faced with non-optimal choices here.  When you can, try to influence
these decisions using some general guidelines.

a. Try to
avoid NULLs

Nulls
are a special case in most databases.  MySQL is no exception to this
rule.  They require more coding to handle internally, more checking,
special index logic, and so on.  Some developers simply aren’t aware, and
when NULL is the default, that is the way tables are created.  However
it’s better to use NOT NULL  in most cases, and even use a special value,
such as 0 or -1 for your default value.  

b. Use
smaller fields where possible

When
MySQL reads data from disk, it stores it in memory, and uses cpu cycles and
disk I/O to read it.  That means smaller data types, which take up less
space will be read from disk and packed into memory more efficiently. 
That said; don’t go so crazy with setting these data types small that you don’t
have room for unexpected changes in the application later.  Alterations of
the table will require restructuring, which is overhead, and potentially code
changes, which is also a headache down the line.  So strike a balance, but
don’t make fields larger than they need to be either.

2. Beware of Character Set Conversions

The
character set that your client or application is using may be different from
that of the table itself.  This will require MySQL to implicitly convert
it on the fly.  In addition, certain character sets such as UTF8 support
multi-byte characters, so they make storage requirements greater.  

3. Optimizing COUNT (my_col) and COUNT (*)

If
you’re using MyISAM tables, count(*) with no where clause is very fast because
the statistics on rowcounts is exact.  So MySQL doesn’t have to look at
the table at all to get the count.  The same can be said for count(my_col) 
if that column is NOT NULL.  

If you
are doing count() with a where clause, there’s not much you can do to optimize
it further, beyond the obvious of indexing the column in the where
clause.  It may be possible that a covering index will help you in this
case, for a more complex where clause.

Short of
the above suggestions, you might go with summary tables.  These can allow
you to keep up to date information about the contents of your table.  You
can use triggers, or application logic to keep the summary table always up to
date, or you can run a batch job periodically to fill it with up to date
information.  If you do the latter, your information will be close, but
not exact, depending on how often the batch job runs.  Weigh your
application’s need for precise information against the overhead of keeping this
data up to date, and strike a balance.

4. Optimizing Subqueries

MySQL’s
query optimization engine isn’t always the most efficient when it comes to subqueries. 
That’s why it is often a good idea to convert a subquery to a join.  Joins
have already been handled properly by the optimizer.  Of course, be sure
the column you’re joining on in the second table is indexed.  On the first
table MySQL usually does a full table scan on against the subset of rows from
the second table.  This is part of the nested loops algorithm, which MySQL
often engages to perform join operations.

5. Optimizing UNION

UNION has an interesting optimization
that exists across a few different databases.  It’s obvious when you think
about how it works.  UNION gives you the rows from two
tables that don’t exist in the other.  So implicitly, you are removing
duplicates.  To do this the MySQL database must return distinct rows, and
thus must sort the data.  Sorting, as we know is expensive, especially for
large tables.  

UNION
ALL can very well be a big speedup for you.  What if you already know that
your data does not contain duplicates in either row, or what if you don’t care
about duplicates?  In either case, UNION ALL is for you.  Further,
there may be other ways you can avoid the duplicates in your rows using some
application logic, so you know that UNION ALL will provide the results you
want, without the heavy overhead of sorting the data.

Conclusion

This
list of query optimizations is by no means exhaustive, but should set you on
the right track to optimizing the SQL in your application.  Whenever
possible, try a few different solutions, look at the query plans, test on large
datasets, benchmark those results, and see what works in the real world.  Also,
be proactive by keeping an eye on your slow query log, and identify further
queries that may need tuning as early as possible.

»
See All Articles by Columnist Sean Hull

Sean Hull
Sean Hull
Sean Hull is a business and technology consultant, author and speaker. He is the founder and senior consultant at Heavyweight Internet Group in New York and has worked as a database consultant and expert for ten years. He is the author of "Oracle and Open Source" on O'Reilly & Associates and his articles have appeared in Oracle's own OTN website, DBA Zine, ChangeThis.com, and many others. Using techniques, technologies, and perspectives discussed in the book, articles, and seminars, he has taken his career into exciting areas. He has served as Technology Consultant, Project Manager, CEO, and mentor with a background in programming, systems integration & administration, project development and management. He can be reached at [email protected] or visit http://www.iheavy.com for more info about consulting services.

Latest Articles