There are many ways you can replicate MySQL data, whether it be a simple two-node cluster, chain or master/master. This article explains the two types of data replication (Statement Based Replication and Row Based Replication) available in MySQL.
There are many ways you can replicate MySQL data whether it be a simple
two-node cluster, chain or master/master. If you use MySQL you probably know
about replication and might have experimented with the replication layouts’
listed above. In this article I will explain not the layout of replication and
why or why not to use a specific layout but the types of replication you can
use. In MySQL, you can use two types of replication, Statement Based
Replication (SBR) and Row Based Replication (RBR).
In MySQL, replication is basically the slave server reading binary logs on
the master server then running the statements or applying blocks to the slave
server. Depending on the type of replication you are using, events are recorded
in different formats to the binary log. Below are the different formats
explained:
- At first, replication was based on propagation of SQL
statements from the master server to the slave server. This replication
format is called statement-based replication (SBR) and is default in older
versions of MySQL (<= 5.1.4). Just as a note that in later versions of
MySQL, especially with the Innodb Plugin,
you NEED to run your transaction_isolation
with REPEATABLE-READ. - The newer replication type is row-based replication (RBR),
which logs changes in individual table rows to the binary log. Basically,
logging the actual change and not the SQL statement itself. - MySQL also has the ability to change its binary logging
format in real time depending on the type of event using the mixed format
binary logging. When the mixed format is turned on, statement-based
replication is on by default but will change to row-based replication is
particular cases. For more information on Mixed Binary Logging please see Mixed
Binary Logging Format in the MySQL Documentation.
As you probably have gathered, all three types of logging have their own
unique advantages and disadvantages. Many users are still on statement-based
replication; however, the mixed replication format should be sufficient when it
comes to performance and data integrity. Below I will list the advantages and
disadvantages of the binary logging formats’ so you will be able to make a
better decision for your system.
Statement-Based Replication – Advantages
- It has been around from the beginning of MySQL version
3.23, although back then it was a lot more buggy! - The binary logs do not grow as fast with statement-based
replication as they do in row-based replication. This makes it faster to
recover from a backup. - The binary log files contain all write activity in SQL
format making it easier to read with mysqlbinlog
and audit your MySQL instance. - Table definitions must be (nearly) identical on master and
slave. This is a bit different than row-based replication where you have
to deal with ordinal positions and the same data type characteristics per
column.
Statement-Based Replication – Disadvantages
- Nondeterministic events that change data are a large issue
with statement-based replication. It is very important to note that, in
some cases the behavior of an event will not yield the same result on the
slave server as it did on the master server. Below are some examples:
- When using a user defined function (UDF) or a stored
routine that is nondeterministic, since the value returned by a UDF or
stored routine or the value depends on factors other than the parameters
supplied to it like the function USER(). - UPDATE and DELETE statements that use a LIMIT clause
without an ORDER BY clause are considered nondeterministic. For example:
- UPDATE sometable set id = 41 where user like
‘%someuser%’ limit 1;
- Note that there could be many rows matched in the where
clause but there is no ORDER BY clause.
o
Statements using any of the following functions cannot be
replicated properly using statement-based replication:
- LOAD_FILE()
- UUID(), UUID_SHORT()
- USER()
- FOUND_ROWS()
- SYSDATE() (unless both the master and the slave are
started with the –sysdate-is-now
option) - GET_LOCK()
- IS_FREE_LOCK()
- IS_USED_LOCK()
- MASTER_POS_WAIT()
- RELEASE_LOCK()
- SLEEP()
- VERSION()
NOTE: All other functions are replicated correctly using
statement-based replication, including RAND(), NOW(), etc. For more
information, see Replication
and System Functions.
- Statements that cannot be replicated correctly using
statement-based replication are logged with a warning like the one shown
here:
[Warning] Statement is not safe to log in statement format.
- You can view a similar warning on the client side by
running SHOW WARNINGS. - Both insert and select queries require a larger number of
row-level locks compared to row-based replication.
- A table scan caused by an update statement where the
update is not using an index in the where clause will lock a larger number
of rows. - For InnoDB: An INSERT statement that uses AUTO_INCREMENT
blocks other non-conflicting INSERT statements. - All statement must be evaluated and executed on the slave
before the rows are updated or inserted. This can have considerable
performance issues on the slave staying up to date. If there is an error
in evaluation on the slave, statement-based replication may have degraded
integrity over time. - Stored functions execute with the same NOW() value as the
calling statement. - Deterministic UDFs must be applied on the slaves.
Here are some important notes about SBR, RBR and the MySQL version you may
be running.
- For versions earlier than 5.1.14 DDL statements like
CREATE are replicated using SBR, while DML statements along with GRANT and
REVOKE are replicated using RBR. - In versions including 5.1.14 and higher the mysql database
is considered a node-specific database. - Any manipulation of the mysql database tables, such as
GRANT, REVOKE and trigger manipulation are all replicated using SBR.
Row-Based Replication – Advantages
- This is considered the safest way to replicate data in
MySQL because ALL changes can be replicated. - SBR is used for statements like CREATE TABLE while the row
insertions and updates use RBR. - RBR has been around for a while, that said, if you have
used a similar RDBMS with RBR you should be able to use MySQL RBR with
ease. - Fewer row locks are needed on the master, for the
following types of statements:
- INSERT/SELECT
- INSERT statements with AUTO_INCREMENT
- UPDATE of DELETE statements that don’t require a table
scan or change most of the examined rows
- Fewer row locks are needed on the slave for INSERT, UPDATE
and DELETE statements.
Row-Based Replication – Disadvantages
- The table schema must be the same from master to slave
when using row-based replication. - You cannot easily examine the binary log with RBR.
Instead, beginning with MySQL 5.1.29, you can see what data was changed
using mysqlbinlog with the options–base64-output=DECODE-ROWS
and–verbose. - If a statement changes many rows, let’s say with an
UPDATE, row-based replication writes more data to the binary log even for
statements that are rolled back. Running point-in-time snapshots can take
more time as well. Concurrency problems may come into play given the lock
times needed to write large chunks of data into the binary log. - Deterministic UDFs that generate large BLOB values take
longer to replicate with row-based replication because the actual value is
replicated, not the SQL statement. - You can get different results when running SELECT
statements on the slave and master servers while running bulk inserts.
Closing thoughts
As stated earlier in this article, the MIXED format should provide you with
the most data integrity and performance. However, if you do choose to set your
binlog_format to STATEMENT or ROW you should test the heck out of it before you
implement into production. I currently use both MIXED and STATEMENT based
replication in production depending on the application I’m working with.