MySQL 5.1 - the next generation
March 21, 2006
There are still people using MySQL 4.0 (and I even came across a MySQL 3.23 installation recently), but MySQL 5.1 is now the latest of the MySQL releases--with 5.1.7, the latest as I write; it has just moved from alpha to beta status. Recently I have been writing about what the future holds for MySQL what with all the Oracle moves. This month's article changes focus and I examine some of the new features in MySQL 5.1.
New types of replication
Previously, replication was simply repeating the statements run on one server on another server (see the Database Journal article on replication if this is new to you). This form of replication was called statement-based replication. It is quite possible for one server to have different data to the other server, if the two datasets began in different states, or if the UPDATE was non-deterministic. As of MySQL 5.1.5, row-based replication is available, and as of 5.1.8, mixed replication (both types depending on the context) will be available.
Statement-based replication has some problems. The main disadvantage is that non-deterministic UPDATE's cannot be replicated properly on the slave server/s. In particular, that applies to the following functions: LOAD_FILE(), UUID(), USER() and FOUND_ROWS(), as well as any non deterministic stored functions that may exist on the server. Other statements that you think this may apply to, such as RAND(), replicate just fine with a bit of behind-the-scenes trickery. With row-based replication, the binary log stores the full effects of a statement on the individual rows. This can make the binary log a lot larger for statements that affect many rows, but can also make updating quicker for updates where only a few rows were affected, but that took a long time to execute.
To activate row-based replication, use one of:
mysql> SET GLOBAL binlog_format = 'ROW';
mysql> SET GLOBAL binlog_format = 2;
You can also set the variable on a per session basis (by using the SESSION keyword instead of GLOBAL. To set it back:
mysql> SET GLOBAL binlog_format = 'STATEMENT';
mysql> SET GLOBAL binlog_format = 1;
Once MySQL 5.1.8 is out (it probably will be by the time this is published), mixed replication will be also be available, and can be activated as follows:
mysql> SET GLOBAL binlog_format = 'MIXED';
mysql> SET GLOBAL binlog_format = 3;
Until MySQL 5.1, the query and the slow query logs were written to log files. As of MySQL 5.1.6, there is another option - log tables inside the database. This is a simple change, but one I find particularly useful. Being able to query the log records with SQL statements allows for all sorts of subtleties to be easily explored. The log tables (mysql.general_log and mysql.slow_log are in the comma-delimited CSV format, making them easy to import into spreadsheets and other applications.
By default after MySQL 5.1.6, logging is to table (so be aware of upgrade issues when upgrading from earlier versions). You can specify which kind of logging to use, or whether to use both, with the --log-output option in your configuration file. Options are:
--log-output=TABLE --log-output=FILE --log-output=NONE --log-output=TABLE,FILE
The --log[=file_name] and --log-slow-queries[=file_name] options, which specify the files to log to, are ignored unless the log output is set to FILE.
Partitioning is an innovation that allows more control over how the tables are physically stored. To some, this sort of thing is a messy blending of the logical and physical levels, but it is a reality of today's databases, especially when trying to squeeze out performance benefits, that DBA's are looking at both. Partitioning allows tables to be distributed across a filesystem, into multiple partitions. There were major changes in 5.1.6, including the addition of partition pruning, which allows you to target a query to a specific partition. By planning this carefully, it is possible to get a reasonable performance increase in some circumstances. I plan to cover this in more detail in a future article. There are different ways to partition a table. You can do so by range, whereby values in a certain range appear in a certain partition. Here's an example with three partitions, one containing records where the id is less than 1000, another less than 10000, and a third those records with an id less than 100000:
CREATE TABLE range_partitioned ( id INT NOT NULL, ) PARTITION BY RANGE (id) ( PARTITION p1 VALUES LESS THAN (1000), PARTITION p2 VALUES LESS THAN (10000), PARTITION p3 VALUES LESS THAN (100000) );
Alternatively, partitioning by list creates partitions according to specified values, for example:
CREATE TABLE list_partitioned( id INT NOT NULL, ) PARTITION BY LIST (id) ( PARTITION p1 VALUES IN (1,3,5,7,9), PARTITION p2 VALUES IN (2,4,6,8,10), PARTITION p3 VALUES IN (0,11,12) );
Partitioning by hash allows MySQL to take care of the partitioning rather than the creator having to define a list or range to use. It requires an integer, or a function returning an integer, although it requires some thought to make the hash as efficient as possible. Hash partitioning does not lend itself to partition pruning, as determining which partition the data sits in is not easy.
CREATE TABLE hash_partitioned( id INT NOT NULL, date_started DATE NOT NULL DEFAULT '1970-01-01', ) PARTITION BY HASH (TO_DAYS(date_started)) PARTITIONS 4;
There's also a variation called linear hash partitioning, which makes use of a linear powers-of-two algorithm as opposed to the modulus of the hashing function's value.
CREATE TABLE linear_hash_partitioned( id INT NOT NULL, date_started DATE NOT NULL DEFAULT '1970-01-01', ) PARTITION BY LINEAR HASH (TO_DAYS(date_started)) PARTITIONS 4;
Finally, there is key partitioning, which is similar to hash partitioning except that it makes use of the primary key, and an internal hashing function.
CREATE TABLE key_partitioned( id INT NOT NULL, date_started DATE NOT NULL DEFAULT '1970-01-01', ) PARTITION BY KEY PARTITIONS 2;
Greater integration of MySQL Clusters
A boon for many, MySQL 5.1.6 saw the introduction of replication between MySQL Clusters, as well as replication between a MySQL Cluster and another MySQL database.
MySQL 5.1 introduces a flexible plugin API. It enables components to be loaded and unloaded while the server is running, using the INSTALL PLUGIN and UNINSTALL PLUGIN statements. In time, it will replace the old user-defined function interface. It is still very new, and currently supports the creation of full-text parser plugins.
MySQL 5.1.4 introduced mysqlslap, a load emulator. It runs like any of the other mysql binaries, and is designed to emulate load on a server, and report back on the timings for each stage.
Not quite the live, impact-free backup everyone is hoping MySQL will pull out of a hat, but MySQL 5.1 adds a new option to mysqldump, which allows the dumping of tablespaces.
Another of the more interesting additions to MySQL 5.1 is the event scheduler. An event is a set of SQL statements that run at a particular time, or over set intervals. They are very similar to triggers, except that the trigger for the event's execution is temporal. Here is an example of a simple event that increments a field every day:
CREATE EVENT passing_days EVERY 1 DAY DO UPDATE databasename.tablename SET days_passed = days_passed + 1;
The scheduling is quite flexible. An event could be scheduled to occur once, over recurring intervals indefinitely, or recurring for a set period. I will also be looking at this in more detail in a future article.
MySQL 5.1's metadata database contains a number of new tables. These include ENGINES, FILES, EVENTS, PARTITIONS, PLUGINS and PROCESSLIST. These provide useful information, available to query, for each of the features.
With XML being the flavor of the day, MySQL 5.1 now comes with a number of XML functions. The new ExtractValue() function returns some XML according to an XPath expression, while UpdateXML() performs a find and replace on some XML, returning the result. Expect to see a lot more XML functionality in MySQL.
It is not only the permissions that need upgrading these days. With new features, such as logging to tables, the old mysql_fix_privilege_tables is no longer so aptly named. It has been replaced with the mysql_upgrade script in MySQL 5.1.7, which should be run for each upgrade, and attempts to make sure everything is in place for it to run smoothly with the latest server version. As before, this still only works on Unix-like servers.
The MySQL Instance Manager is one of the lesser-known features. It can be used to start and monitor one or more servers, instead of mysqld_safe or mysqld_multi. With MySQL 5.1, it now has a few new features. It can list all log files, return part of a log file, and supply configuration options.
MySQL 5.1, as one would expect with a 0.1 version change, is not revolutionary. However, it continues to add to MySQL's feature set, and the events, partitioning, logging changes and MySQL Cluster integration are perhaps the most welcome additions. It is still in beta as I write this, but should be released as stable before the end of the year. It is worth considering using MySQL 5.1 right now for any major development work. Having moved to beta, there shouldn't be many new features being added, as all the attention turns to ironing out bugs.