The Wonders of Maatkit for MySQL
May 18, 2010
MySQL is a great database for web-facing applications, however, it tends to be a bit rough around the edges. Enter Maatkit, a great toolkit with a bewildering array of command line tools that fill the gap where MySQL's native tools leave off. From data replication to query profiling and optimizing, Maatkit has tools to make you smarter, and help you get your job done.
MySQL is a great database for web-facing applications, and it's evolved to
provide a surprisingly large set of sophisticated features well beyond what the
original designers probably envisaged.
However, it tends to be a bit rough around the edges, and it takes some
shortcuts, and fudges a bit all in the name of providing you the most bang for
your virtual buck.
Enter Maatkit, a great toolkit with a bewildering array of command line tools
that fill the gap where MySQL's native tools leave off. From Replication to
query profiling and optimizing, Maatkit has tools to make you smarter, and help
you get your job done.
Front and center is the table checksum tool. Why would you need to create
checksums on your tables, you ask. Turns out MySQL's replication technology is
deceptively easy to set up, and it may *seem* to be running properly, all the
while your slave does not match your master. That's right, fire up this tool, and
you'll will be surprised or even horrified to find your slave is not only
different, but if you were performing backups off the slave, your backups are
invalid as well!
The tool works by creating checksums of table contents at a point in time, then
propagating those checksum tables through replication to the slave, where you
can then run mk-table-checksum with different options to check for consistency.
This tool is like mk-table-checksum's assistant, allowing you to filter and
sift through the output for the relevant information you want.
When you have found differences in tables between your master and slave
database, you'll need a method for bringing them back in line. That's where
mk-table-sync comes in. It allows you to bring over the missing rows or repair
and update the changes, without having to completely rebuild your slave
MySQL's replication technology typically keeps up pretty well with the master,
all things considered, but what if you want your slave to be behind the master?
Perhaps you want to protect yourself from operator error, an accidental drop
table, or drop database? With mk-slave-delay set to perhaps one hour, you can
do this kind of thing. It basically disables the slave SQL thread for the
specified period of time, and applies past transactions from the relay log
based on the embedded timestamps there.
The nature of MySQL's statement based replication is that it serialized, which
means all of the sessions that were operating concurrently on production, stand
in line on the slave, in the order they were committed. In addition, the slave
is only running inserts, updates, deletes, drops, creates, and so on. So it's
cache may not always be as *warm* as the production cache. A warm cache means
the blocks of data you're looking for don't have to be fetched from disk,
because they're already in your cache, resulting in a logical I/O instead of a
physical I/O. What that means to you and me is they're much much faster!
mk-slave-prefetch provides a way to *warm up* your cache before the slave
actually tries to run the relevant queries. This works well in parallel as it's
only attempting to do the disk I/O before you actually need it, helping the
slave perform and stay in closer sync with the master.
When you first learn about replication in MySQL you issue the "show slave
status" command. You are probably already familiar with the
"Seconds_Behind_Master:" field. Turns out the information here is not
reliable. That's where mk-heartbeat comes in. It creates a heartbeat record on
the master, and then that is propagated through replication. The check on the
slave then looks at that record to see the actual slave lag. Nifty!
The tools mentioned previously talk about keeping a slave in sync when it might
fail silently or otherwise get out of sync. This tool tries to keep the slave
running after an actual error, by skipping the error. In general, it's a better
idea to monitor why your slave is out of sync or getting errors, and fix them
properly. You wouldn't want to build up a false sense of security when this
tool is really just buying time. Nevertheless, in some cases it might be handy.
If you have a simple master-slave setup, you probably don't need a way of
mapping your replication configuration visually. However, if you have a more
complex setup, this tool might be useful. For instance master-master with
slaves, or circular with active and passive master, or three masters in a
circular configuration, or even a distribution master to take load off the real
master, and support lots of slaves. All of these configurations could make use
of this tool to give you insight on your setup.
If you've ever wanted to move a replication slave from one point in your
topology to another, you'll find it is a very manual process, with plenty of
potential for mistakes. This tool automates that process, if you need to do it.
This tool has a lot of different interesting use cases, and if you're doing
server performance tuning, it will definitely help you. First, you can use it
to monitor the processlist in MySQL dynamically, capturing data for a period of
time. By probing the database in this way, it's able to do sub-second slow
query capturing, which earlier versions of MySQL couldn't do (without a patch).
When you finish capturing data, it will dump the results to standard out,
providing you with a handy text-based graph of the activities of your database,
and the queries it ran during that interval.
The tool can also be used in conjunction with tcpdump. This might be useful if
you didn't have access to login to the MySQL server itself, or if you wanted to
be very gentle on the server you're monitoring.
The tool can also be used to analyze a slow query log itself, so all-in-all a
The explain facility in MySQL is great for giving you insight on how the
optimizer will get your data from the respective storage engines. This tool
will format the output of explain as a tree diagram, which can be more
You can hand this tool a file with some SQL statements in it, and it will run
them in batch, and provide information about what resources those queries used
in aggregate. Did they open temporary tables, or do full table scans? This tool
will tell you.
This works in conjunction with mk-query-profile to format the output in a more
readable way, allowing you to view the results of multiple profiles
Maatkit is a tremendous toolkit for a MySQL DBA. It directly addresses
day-to-day needs of DBAs, providing features you never knew you wanted, and
some you sorely missed. In next month's article, we'll cover the rest of the
Maatkit suite of command line tools - yep there's more, a lot more!
Maatkit Project Hosting on Google Code
See All Articles by Columnist