MySQL has an interesting architecture that sets it apart from some other enterprise database systems. It allows you to plug in different modules to handle storage. What that means to end users is that it is quite flexible, offering an interesting array of different storage engines with different features, strengths, and tradeoffs.
Introduction
MySQL
has an interesting architecture that sets it apart from some other enterprise
database systems. It allows you to plug in different modules to handle
storage. What that means to end users is that it is quite flexible,
offering an interesting array of different storage engines with different
features, strengths, and tradeoffs.
MySQL’s Storage Engine Architecture
The
various storage engines as the name implies manage how data is stored and
retrieved on disk, and the features available therein. One important
thing to keep in mind however is that MySQL doesn’t manage transactions at the
server level, but at the storage engine level. This is great if you have
a mix of InnoDB tables in a query, or a mix of MyISAM ones, but if you have
both in the same transaction, not only might performance be impacted, but also
results may not be reliable. Although applications can use a mix of
different types of tables using different underlying engines, single queries
and transactions should not in general do this.
Do
you want to see what storage engines you have available? Just query your
MySQL variables as follows:
mysql&g; show variables like 'have%'; +-----------------------+----------+ | Variable_name | Value | +-----------------------+----------+ | have_archive | NO | | have_bdb | YES | | have_blackhole_engine | NO | | have_compress | YES | | have_crypt | YES | | have_csv | NO | | have_dynamic_loading | YES | | have_example_engine | NO | | have_federated_engine | NO | | have_geometry | YES | | have_innodb | YES | | have_isam | NO | | have_merge_engine | YES | | have_ndbcluster | NO | | have_openssl | DISABLED | | have_ssl | DISABLED | | have_query_cache | YES | | have_raid | NO | | have_rtree_keys | YES | | have_symlink | YES | +-----------------------+----------+
Note
these are features denoted by "have_*" so some of these listed are
not storage engines.
MyISAM
This
storage engine was one of the first. Locking is done at the table level
so that can be a bottleneck very quickly for highly concurrent
applications. It does not support transactions; however, it does support
fulltext indexes, which can certainly come in handy for some
applications. With MyISAM, each table has three files, one for the table data,
one for index data, and one for the definition or frm file. MySQL can
allocate a key cache for MyISAM index data, however caching of the table data
is left to the filesystem. So be sure to leave enough memory to the OS if
you are using large MyISAM tables. MyISAM supports some table repair
after a crash, but does suffer from corruption more often than other table
types, such as InnoDB.
Archive
This
storage engine is optimized for high speed inserting, and compresses data as it
is inserted. As of 5.1 indexes can be added on these tables, so they
might make sense on a replicated slave where you want to do reporting, as
slaves do not have to use the same storage engine as the same table on the
master. It does not support transactions.
InnoDB
Perhaps
the best-known storage engine, InnoDB, supports row-level locking and
multi-version concurrency control. This means in high concurrency
applications such as web-facing ones, InnoDB is ideal. InnoDB stores data
in tablespaces although the underlying storage is managed internally. It
reads and writes data in chunks called blocks, and caches in a configurable
buffer cache both data and index blocks. InnoDB also supports crash
recovery. If you pull the plug on the box, and then reboot, when MySQL
starts up, it will rollback all standing and open transactions that were
interrupted by the power failure.
Merge
The
Merge storage engine operates on underlying MyISAM tables similar to how a VIEW
would. In other words, it’s not storing data itself per se, but providing
a container of sorts for them. As it turns out this code is very closely
related to how partitions are handled and created. Merge tables help you
manage large volumes of data more easily. They can be useful for
logging in applications as you can easily remove old data by dropping the
underlying tables. You can think of them as UNION ALL Views. One
thing to keep in mind about MERGE tables is that the create statement does not
check the underlying tables for compatibility so you may not find out they’re
incompatible until you use them.
Memory
These
tables as you might expect are stored in-memory. The structure will
persist past a database restart, but the data will not. They’re good for
lookup tables, or caching results during aggregation. Although access is
fast, concurrency may be limited because they do table-level locking.
CSV
This
storage engine is a little less known and kind of an interesting one. It
allows you to treat filesystem files in csv format as tables! Want to
select certain rows, do some manipulations on those, and insert them into a
table? This is a great way to do it. This is very similar to what
Oracle calls "External Tables".
NDB Cluster
Another
well-known storage engine. It allows one to cluster tables. That
means you have multiple masters, which all can do inserts, updates and deletes
on the same table. NDB has row-level locking, but not full multi-version
concurrency control. Due to some architectural limitations of how the
MySQL server executes joins, they perform rather poorly with NDB storage
engine. For that reason, it is best used for single table primary key
lookups. So, you would not likely port your entire web-facing database to
NDB Cluster for example.
Conclusions
The
MySQL architecture offers a rich selection of storage engines with different
characteristics, strengths, and weaknesses. We’ve covered some of the
most common ones here. Next month we’ll look at some of the newer and
also some more exotic ones such as Blackhole, Maria, PBXT, and more.