MySQL Tuning for Large Datasets

Did you know that large well-known companies like Google, Yahoo, NASA, and Walmart all use MySQL? Well, it’s true.  Moreover, some of these companies’ tables may contain billions of rows.  Yet, they all maintain enviable performance.   While there is no silver bullet to keeping your MySQL database running briskly, there are a few performance tweaks that you can do to keep things chugging along when the data starts to pile up.  In today’s article, we’ll take a look at a few of these.

Start at the Beginning

Even before designing your database, it helps to know how your tables will be utilized.  For instance, if you know that you will be updating certain pieces of data often, it is probably best to house those in their own table.  Moreover, updating operations tend to be faster with smaller tables. On the other hand, in-depth analysis of complex data is usually best served by large tables, as table joining can be time consuming operations.  I know that you were likely told (many times) that normalizing makes the data smaller by eliminating redundancy. However, the dark side to normalization is that it can dramatically increase the number of index lookups.  Taking into account that many of these could require random access reads on standard hard drives, it becomes apparent that you might sometimes benefit from some denormalization: 

While freshly loaded databases tend to have good sequential behavior, this behavior becomes increasingly difficult to maintain as a database grows, resulting in more random I/O and performance challenges. – Wikipedia

Although the whole idea of denormalization might be blasphemous to you, with the recent-ish emergence of “megadata”, a lot of the major players such as Google, eBay and Amazon have denormalized their databases in order to improve both performance and scalability.  This paradigm shift has even prompted many a database designer to declare that maybe normalization is for sissies!

Use InnoDB Rather than MyISAM

InnoDB has change buffering (called the insert buffer prior to version 5.5), which is a feature that reduces the disk IO required to maintain secondary indexes. It was greatly improved along with the name change. It can boost insert performance by quite a lot, and is enabled by default.  

MyISAM can be faster at inserts to the end of a table, but it employs table locking and a single lock to protect the key buffer when loading data to/from disk, resulting in contention. It also does not have the change buffering feature described above.

Enable InnoDB Table Compression

Another benefit of InnoDB is its support for table compression. Compression can help to improve both raw performance and scalability. The effect of compression is twofold: a reduction in data transfer between disk and memory, and more compact storage on disk and in memory. Moreover, these benefits are amplified for tables with secondary indexes, because index data is compressed as well. Compression is especially advantageous for SSD (flash drive) storage devices, as they tend to have a lower capacity than traditional HDD (spinning hard drive) devices.

Bulk Load Data in PRIMARY KEY Order Whenever Possible

When performing bulk inserts, it’s faster to insert rows in PRIMARY KEY order. InnoDB physically orders rows by the primary key, which can lead to page splits when rows are inserted between other rows. Page splits will especially degrade performance on tables that are not in memory (due to the extra disk reads).

Hardware Optimization

You can tweak MySQL settings until the cows come home, but if you’re working on sub-optimal hardware, it may not amount to a hill of beans.

Memory

Sixteen to thirty two Gigs of RAM should give you the optimal bang for your buck at the time of writing.

Processing Power

MySQL (as of v5.5) is fully multi-threaded, and is therefore able to make use of multiple CPUs, provided that the operating system supports them. Two Dual Core CPUs will do very well, though many DBAs will go for even more processors for scalability reasons.

Operating System

The particular O/S used is not so important, as long as it supports 64bit processing.  Most DBAs opt for 64bit Linux with Logical Volume Manager (LVM) to make backups more efficient.

Storage

Directly attached storage with several spindles and RAID (Redundant Array of Independent Disks) is the standard protocol for storage. The new 2.5″ Serial Attached SCSI (SAS) hard drives are tiny but often faster than traditional larger drives.

As mentioned previously, using SSD is a viable option is some cases.  If you perform a lot of reads an SSD can make a dramatic difference. Consider that on a 7200 RPM drive you could expect no more than ~100 (Input/Output Operations Per Second) IOPS while the cheapest SSD can be as minimum 5x faster. With better SSD you can get to 20000 IOPS or even more.  Having said that, if you have a good deal of memory and tend to hit only 20% of your data, then MySQL can keep it in memory.

 

General Tips & Tricks

Here are a few things that you can do from time to time to make sure that your database is performing in an efficient manner.

  1. Analyze your indexes on all tables, starting with the high volume insert/read tables. 
    Remove any unnecessary indexes on the table, paying particular attention to UNIQUE indexes as these disable change buffering. Don’t use a UNIQUE index unless you need it; instead, employ a regular INDEX.
  2. Take a look at your slow query log every week or two. Pick the slowest three queries and optimize those.
  3. Consider loading a replica slave server. Off-load some of high volume read queries to the replica. The slave also provides additional redundancy.

Conclusion

When it comes to database optimization, remember that any tip is really nothing more than a generic suggestion.  As such, you can’t be certain that it will work for you particular setup and usage patterns without further evaluation.  Furthermore, there are many other ways to optimize your MySQL server not touched upon here today.  For instance, MySQL has a number of server variables that can be set to further improve performance.  We’ll get to those shortly!

See all articles by Rob Gravelle

Robert Gravelle
Robert Gravelle
Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles