There are still people using MySQL 4.0 (and I even came across a MySQL 3.23
installation recently), but MySQL 5.1 is now the latest of the MySQL releases–with
5.1.7, the latest as I write; it has just moved from alpha to beta status.
Recently I have been writing about what the future holds for MySQL what with
all the Oracle moves. This month’s article changes focus and I examine some of
the new features in MySQL 5.1.
New types of replication
Previously, replication was simply repeating the statements run on one
server on another server (see the Database
Journal article on replication if this is new to you). This form of
replication was called statement-based replication. It is quite possible for
one server to have different data to the other server, if the two datasets
began in different states, or if the UPDATE was non-deterministic. As of MySQL
5.1.5, row-based replication is available, and as of 5.1.8, mixed replication
(both types depending on the context) will be available.
Statement-based replication has some problems. The main disadvantage is that
non-deterministic UPDATE’s cannot be replicated properly on the slave server/s.
In particular, that applies to the following functions: LOAD_FILE(), UUID(),
USER() and FOUND_ROWS(), as well as any non deterministic
stored functions that may exist on the server. Other statements that you think
this may apply to, such as RAND(), replicate just fine with a bit of
behind-the-scenes trickery. With row-based replication, the binary log stores
the full effects of a statement on the individual rows. This can make the
binary log a lot larger for statements that affect many rows, but can also make
updating quicker for updates where only a few rows were affected, but that took
a long time to execute.
To activate row-based replication, use one of:
mysql> SET GLOBAL binlog_format = 'ROW';
or
mysql> SET GLOBAL binlog_format = 2;
You can also set the variable on a per session basis (by
using the SESSION keyword instead of GLOBAL. To set it back:
mysql> SET GLOBAL binlog_format = 'STATEMENT';
or
mysql> SET GLOBAL binlog_format = 1;
Once MySQL 5.1.8 is out (it probably will be by the time
this is published), mixed replication will be also be available, and can be
activated as follows:
mysql> SET GLOBAL binlog_format = 'MIXED';
or
mysql> SET GLOBAL binlog_format = 3;
Logging changes
Until MySQL 5.1, the query and the slow query logs were written to log
files. As of MySQL 5.1.6, there is another option – log tables inside the
database. This is a simple change, but one I find particularly useful. Being
able to query the log records with SQL statements allows for all sorts of subtleties
to be easily explored. The log tables (mysql.general_log and mysql.slow_log
are in the comma-delimited CSV format, making them easy to import into
spreadsheets and other applications.
By default after MySQL 5.1.6, logging is to table (so be aware of upgrade
issues when upgrading from earlier versions). You can specify which kind of
logging to use, or whether to use both, with the –log-output option
in your configuration file. Options are:
–log-output=TABLE
–log-output=FILE
–log-output=NONE
–log-output=TABLE,FILE
The –log[=file_name] and –log-slow-queries[=file_name]
options, which specify the files to log to, are ignored unless the log output
is set to FILE.
Partitioning
Partitioning is an innovation that allows more control over how the tables
are physically stored. To some, this sort of thing is a messy blending of the
logical and physical levels, but it is a reality of today’s databases,
especially when trying to squeeze out performance benefits, that DBA’s are
looking at both. Partitioning allows tables to be distributed across a filesystem,
into multiple partitions. There were major changes in 5.1.6, including
the addition of partition pruning, which allows you to target a query
to a specific partition. By planning this carefully, it is possible to get a
reasonable performance increase in some circumstances. I plan to cover this in
more detail in a future article. There are different ways to partition a table.
You can do so by range, whereby values in a certain range appear in a
certain partition. Here’s an example with three partitions, one containing
records where the id is less than 1000, another less than 10000, and a third
those records with an id less than 100000:
CREATE TABLE range_partitioned (
id INT NOT NULL,
)
PARTITION BY RANGE (id) (
PARTITION p1 VALUES LESS THAN (1000),
PARTITION p2 VALUES LESS THAN (10000),
PARTITION p3 VALUES LESS THAN (100000)
);
Alternatively, partitioning by list creates
partitions according to specified values, for example:
CREATE TABLE list_partitioned(
id INT NOT NULL,
)
PARTITION BY LIST (id) (
PARTITION p1 VALUES IN (1,3,5,7,9),
PARTITION p2 VALUES IN (2,4,6,8,10),
PARTITION p3 VALUES IN (0,11,12)
);
Partitioning by hash allows MySQL to take care of
the partitioning rather than the creator having to define a list or range to
use. It requires an integer, or a function returning an integer, although it
requires some thought to make the hash as efficient as possible. Hash
partitioning does not lend itself to partition pruning, as determining which
partition the data sits in is not easy.
CREATE TABLE hash_partitioned(
id INT NOT NULL,
date_started DATE NOT NULL DEFAULT ‘1970-01-01’,
)
PARTITION BY HASH (TO_DAYS(date_started))
PARTITIONS 4;
There’s also a variation called linear hash partitioning,
which makes use of a linear powers-of-two algorithm as opposed to the modulus
of the hashing function’s value.
CREATE TABLE linear_hash_partitioned(
id INT NOT NULL,
date_started DATE NOT NULL DEFAULT ‘1970-01-01’,
)
PARTITION BY LINEAR HASH (TO_DAYS(date_started))
PARTITIONS 4;
Finally, there is key partitioning, which is
similar to hash partitioning except that it makes use of the primary key, and
an internal hashing function.
CREATE TABLE key_partitioned(
id INT NOT NULL,
date_started DATE NOT NULL DEFAULT ‘1970-01-01’,
)
PARTITION BY KEY
PARTITIONS 2;
Greater integration of MySQL Clusters
A boon for many, MySQL 5.1.6 saw the introduction of replication between
MySQL Clusters, as well as replication between a MySQL Cluster and another
MySQL database.
Plugin API
MySQL 5.1 introduces a flexible plugin API. It enables components to be
loaded and unloaded while the server is running, using the INSTALL PLUGIN
and UNINSTALL PLUGIN statements. In time, it will replace the old
user-defined function interface. It is still very new, and currently supports
the creation of full-text parser plugins.
Testing load
MySQL 5.1.4 introduced mysqlslap, a load emulator. It runs like any
of the other mysql binaries, and is designed to emulate load on a server, and
report back on the timings for each stage.
Backup changes
Not quite the live, impact-free backup everyone is hoping
MySQL will pull out of a hat, but MySQL 5.1 adds a new option to mysqldump,
which allows the dumping of tablespaces.
Event scheduler
Another of the more interesting additions to MySQL 5.1 is the event
scheduler. An event is a set of SQL statements that run at a particular time,
or over set intervals. They are very similar to triggers, except that the
trigger for the event’s execution is temporal. Here is an example of a simple
event that increments a field every day:
CREATE EVENT passing_days
EVERY 1 DAY
DO
UPDATE databasename.tablename SET days_passed = days_passed + 1;
The scheduling is quite flexible. An event could be
scheduled to occur once, over recurring intervals indefinitely, or recurring
for a set period. I will also be looking at this in more detail in a future
article.
More metadata
MySQL 5.1’s metadata database contains a number of new tables. These include
ENGINES, FILES, EVENTS, PARTITIONS, PLUGINS and PROCESSLIST. These provide
useful information, available to query, for each of the features.
XML functions
With XML being the flavor of the day, MySQL 5.1 now comes with a number of
XML functions. The new ExtractValue() function returns some XML according to an
XPath expression, while UpdateXML() performs a find and replace on some XML,
returning the result. Expect to see a lot more XML functionality in MySQL.
Easier upgrading
It is not only the permissions that need upgrading these days. With new
features, such as logging to tables, the old mysql_fix_privilege_tables
is no longer so aptly named. It has been replaced with the mysql_upgrade
script in MySQL 5.1.7, which should be run for each upgrade, and attempts to
make sure everything is in place for it to run smoothly with the latest server
version. As before, this still only works on Unix-like servers.
Instance Manager
The MySQL Instance Manager is one of the lesser-known features. It can be
used to start and monitor one or more servers, instead of mysqld_safe or mysqld_multi.
With MySQL 5.1, it now has a few new features. It can list all log files,
return part of a log file, and supply configuration options.
Conclusion
MySQL 5.1, as one would expect with a 0.1 version change, is not
revolutionary. However, it continues to add to MySQL’s feature set, and the
events, partitioning, logging changes and MySQL Cluster integration are perhaps
the most welcome additions. It is still in beta as I write this, but should be
released as stable before the end of the year. It is worth considering using
MySQL 5.1 right now for any major development work. Having moved to beta, there
shouldn’t be many new features being added, as all the attention turns to
ironing out bugs.