Comparison of MySQL Statement Based Replication and Row Based Replication


There are many ways you can replicate MySQL data, whether it be a simple two-node cluster, chain or master/master. This article explains the two types of data replication (Statement Based Replication and Row Based Replication) available in MySQL.

There are many ways you can replicate MySQL data whether it be a simple
two-node cluster, chain or master/master. If you use MySQL you probably know
about replication and might have experimented with the replication layouts’
listed above. In this article I will explain not the layout of replication and
why or why not to use a specific layout but the types of replication you can
use. In MySQL, you can use two types of replication, Statement Based
Replication (SBR) and Row Based Replication (RBR).

In MySQL, replication is basically the slave server reading binary logs on
the master server then running the statements or applying blocks to the slave
server. Depending on the type of replication you are using, events are recorded
in different formats to the binary log. Below are the different formats
explained:

  • At first, replication was based on propagation of SQL
    statements from the master server to the slave server. This replication
    format is called statement-based replication (SBR) and is default in older
    versions of MySQL (<= 5.1.4). Just as a note that in later versions of
    MySQL, especially with the Innodb Plugin,
    you NEED to run your transaction_isolation
    with REPEATABLE-READ.
  • The newer replication type is row-based replication (RBR),
    which logs changes in individual table rows to the binary log. Basically,
    logging the actual change and not the SQL statement itself.
  • MySQL also has the ability to change its binary logging
    format in real time depending on the type of event using the mixed format
    binary logging. When the mixed format is turned on, statement-based
    replication is on by default but will change to row-based replication is
    particular cases. For more information on Mixed Binary Logging please see Mixed
    Binary Logging Format
    in the MySQL Documentation.

As you probably have gathered, all three types of logging have their own
unique advantages and disadvantages. Many users are still on statement-based
replication; however, the mixed replication format should be sufficient when it
comes to performance and data integrity. Below I will list the advantages and
disadvantages of the binary logging formats’ so you will be able to make a
better decision for your system.

Statement-Based Replication – Advantages

  • It has been around from the beginning of MySQL version
    3.23, although back then it was a lot more buggy!
  • The binary logs do not grow as fast with statement-based
    replication as they do in row-based replication. This makes it faster to
    recover from a backup.
  • The binary log files contain all write activity in SQL
    format making it easier to read with mysqlbinlog
    and audit your MySQL instance.
  • Table definitions must be (nearly) identical on master and
    slave. This is a bit different than row-based replication where you have
    to deal with ordinal positions and the same data type characteristics per
    column.

Statement-Based Replication – Disadvantages

  • Nondeterministic events that change data are a large issue
    with statement-based replication. It is very important to note that, in
    some cases the behavior of an event will not yield the same result on the
    slave server as it did on the master server. Below are some examples:
    • When using a user defined function (UDF) or a stored
      routine that is nondeterministic, since the value returned by a UDF or
      stored routine or the value depends on factors other than the parameters
      supplied to it like the function USER().
    • UPDATE and DELETE statements that use a LIMIT clause
      without an ORDER BY clause are considered nondeterministic. For example:
      • UPDATE sometable set id = 41 where user like
        ‘%someuser%’ limit 1;
        • Note that there could be many rows matched in the where
          clause but there is no ORDER BY clause.

o   
Statements using any of the following functions cannot be
replicated properly using statement-based replication:

      • LOAD_FILE()
      • UUID(), UUID_SHORT()
      • USER()
      • FOUND_ROWS()
      • SYSDATE() (unless both the master and the slave are
        started with the –sysdate-is-now
        option)
      • GET_LOCK()
      • IS_FREE_LOCK()
      • IS_USED_LOCK()
      • MASTER_POS_WAIT()
      • RELEASE_LOCK()
      • SLEEP()
      • VERSION()

NOTE: All other functions are replicated correctly using
statement-based replication, including RAND(), NOW(), etc. For more
information, see Replication
and System Functions
.

  • Statements that cannot be replicated correctly using
    statement-based replication are logged with a warning like the one shown
    here:

[Warning] Statement is not safe to log in statement format.

  • You can view a similar warning on the client side by
    running SHOW WARNINGS.
  • Both insert and select queries require a larger number of
    row-level locks compared to row-based replication.
  • A table scan caused by an update statement where the
    update is not using an index in the where clause will lock a larger number
    of rows.
  • For InnoDB: An INSERT statement that uses AUTO_INCREMENT
    blocks other non-conflicting INSERT statements.
  • All statement must be evaluated and executed on the slave
    before the rows are updated or inserted. This can have considerable
    performance issues on the slave staying up to date. If there is an error
    in evaluation on the slave, statement-based replication may have degraded
    integrity over time.
  • Stored functions execute with the same NOW() value as the
    calling statement.
  • Deterministic UDFs must be applied on the slaves.

Here are some important notes about SBR, RBR and the MySQL version you may
be running.

  • For versions earlier than 5.1.14 DDL statements like
    CREATE are replicated using SBR, while DML statements along with GRANT and
    REVOKE are replicated using RBR.
  • In versions including 5.1.14 and higher the mysql database
    is considered a node-specific database.
  • Any manipulation of the mysql database tables, such as
    GRANT, REVOKE and trigger manipulation are all replicated using SBR.

Row-Based Replication – Advantages

  • This is considered the safest way to replicate data in
    MySQL because ALL changes can be replicated.
  • SBR is used for statements like CREATE TABLE while the row
    insertions and updates use RBR.
  • RBR has been around for a while, that said, if you have
    used a similar RDBMS with RBR you should be able to use MySQL RBR with
    ease.
  • Fewer row locks are needed on the master, for the
    following types of statements:
    • INSERT/SELECT
    • INSERT statements with AUTO_INCREMENT
    • UPDATE of DELETE statements that don’t require a table
      scan or change most of the examined rows
  • Fewer row locks are needed on the slave for INSERT, UPDATE
    and DELETE statements.

Row-Based Replication – Disadvantages

  • The table schema must be the same from master to slave
    when using row-based replication.
  • You cannot easily examine the binary log with RBR.
    Instead, beginning with MySQL 5.1.29, you can see what data was changed
    using mysqlbinlog with the options–base64-output=DECODE-ROWS
    and–verbose.
  • If a statement changes many rows, let’s say with an
    UPDATE, row-based replication writes more data to the binary log even for
    statements that are rolled back. Running point-in-time snapshots can take
    more time as well. Concurrency problems may come into play given the lock
    times needed to write large chunks of data into the binary log.
  • Deterministic UDFs that generate large BLOB values take
    longer to replicate with row-based replication because the actual value is
    replicated, not the SQL statement.
  • You can get different results when running SELECT
    statements on the slave and master servers while running bulk inserts.

Closing thoughts

As stated earlier in this article, the MIXED format should provide you with
the most data integrity and performance. However, if you do choose to set your
binlog_format to STATEMENT or ROW you should test the heck out of it before you
implement into production. I currently use both MIXED and STATEMENT based
replication in production depending on the application I’m working with.

»


See All Articles by Columnist

Chris Schneider

Chris Schneider
Chris Schneider
Certified MySQL professional, actively involved in the MySQL community for the better end of a decade. I have designed, implemented and maintained small to large scale MySQL installations while training and mentoring teams of DBAs. This includes building architectures from the ground up and improving on those that are currently in place while emphasizing scalability, performance and ease of use. Most of the work I am involved with now is in the Web 2.0 arena. I also run a community site, EverythingMySQL.com, that promotes collaboration within the open source community, specifically, MySQL.

Latest Articles