Achieving Five Nines Availability Using Synchronous Cluster Replication for MySQL

“Let me tell you the difference between Facebook and everyone else, we don’t crash EVER! If our service is down for even a minute, our entire reputation is irreversibly destroyed!”
-The Social Network

If you believe the hype, it would seem that almost all data and database as a service (DaaS/DBaaS) cloud providers operate at a formidable 99.999 percent availability, more commonly known as the five nines.  Heck, Amazon’s S3 premium service promises ten nines – that’s 99.99999999 percent uptime!

While these claims often turn out to be unsubstantiated, you don’t need to go to the cloud to achieve it using your own infrastructure. One option that recently piqued my interest is a product called SchoonerSQL, a re-jigged MySQL database designed for applications that can experience only minimal downtime – mission-critical apps and such. Besides offering the coveted five nines, SchoonerSQL also guarantees no lost or stale data, instant automated failover over LAN and WAN, as well as unlimited read scaling, unlimited write scaling, and easy cluster administration. Being the curious type, I wanted to know more about how SchoonerSQL achieves all this. In the process, I learned quite a lot about data replication, which is what this article is about.

The Perils of Loosely-coupled Asynchronous and Semi-synchronous Replication

Schooner started with a standard MySQL build and modified its InnoDB storage engine in order to overcome some of its weaknesses. One of the big ones is how MySQL handles replication. MySQL has support for asynchronous (default) and semi-synchronous replications. MySQL 5.1 introduced asynchronous replication, while MySQL 5.5 provided semi-synchronous replication. Hence, those whose infrastructures require full synchronization would have to move to a different solution such as MySQL NDB Clustering, for example.

MySQL uses loosely-coupled MySQL asynchronous and semi-synchronous replication between the master database and slaves. The master node maintains a binary log file of all the changes performed in the database. The slave(s) periodically read(s) that file and perform the replication using one of the following replication types:

·       Statement Based Replication (SBR):
By default, the slave executes the exact same SQL statements that the master did in its database(s).

·       Row Based Replication (RBR):
Here the slave(s) change/replicate only the changed rows.

·       Mixed Based Replication (MBR):
Uses a combination of both the above replication types.

All these replication models have associated drawbacks.  A few significant ones include the following:

·       Reduced Service Availability:
When a Master fails, fail-over to a slave only occurs once all transactions in the relay logs have been committed and a new Master established, and the remaining slaves have been reconfigured.

·       Stale Data:
Due to the lag between the Master and slave(s) in reading and applying changes, slaves can return old (stale) data in response to read transactions.

·       High Administrative Complexity:
Database administrators (DBAs) often bear the brunt of the burden when things go wrong. That time of panic and frustration is not ideal for performing delicate manual processes! Even with a solid contingency plan in place, you can never be sure how things will go when it really counts. I personally know a DBA group that had to do an “all-nighter” after a crash just last week! Even when all is well, DBAs still have to perform tedious and error-prone common tasks such as slave migrations and additions as well as hardware and software upgrades.

Why Tightly-coupled Synchronous Cluster Replication is More Efficient

Unlike asynchronous and semi-synchronous replication described above, synchronous cluster replication exploits the capabilities of modern multi-core processors for parallel processing, or what we in the software programming biz refer to as multi-threading. Using multi-threaded processes allows the main database to concurrently communicate, replicate, and apply master update transactions to all the replicas. There is no concept of a “Slave” in a synchronous cluster. The Schooner people refer to all replicas as “Read Masters” rather than Slaves because, in synchronous replication, the Master and all Read Masters in a synchronous cluster are always consistent.

In the event of a master failure, a Read Master (replica) is automatically promoted to become the new Master, the Master’s Virtual IP addresses (VIPs) are instantly and automatically switched to the newly promoted Read Master, and updates continue to be processed without any service interruption. If a Read Master (replica) fails, its load is automatically switched and load balanced to surviving nodes. Fail-over is automatic and completes within a few seconds with no service interruption and no data loss. Moreover, using the synchronous cluster replication reduced downtime by 85% to 95% compared to the asynchronous or semi-synchronous replication in MySQL 5.5 and 5.6.

For more in-depth and technical information, you may want to check out these PDF format white papers on the Schooner site, entitled:

·       Top 10 Reasons why MySQL Experts Switch to SchoonerSQL

·       The Short Guide to MySQL HA Options

·       Mission-Critical MySQL and InnoDB

In case you’re wondering, Schooner is supported on all of the widely used x86 servers and blades from Dell, HP and IBM, running Red Hat Linux 5.4/5.5 or CentOS 5.4/5.5 along with any flash memory in SAS/SATA or PCIe formats, including drives qualified by Dell, HP and IBM or sold by Fusion-io, LSI, Unigen, Intel, OCZ and Pliant.

Website: www.schoonerinfotech.com/products/schooner_mysql

Licensing

Pricing is by annual subscription per server.  At $9500, it’s priced $500 less than a MySQL Cluster Carrier Grade Edition one year license and a whole lot less than the old $45,000 hardware appliances.  Schooner MySQL is available for download and 21 day free trial

Conclusion

While there can be little doubt that SchoonerSQL works and works well, at the end of the day, the question is whether or not to go with a DaaS or DBaaS cloud service as opposed to managing the infrastructure yourself. The key factor in answering this question to me is how open you are to the idea of someone else managing your private data. Here at Canada Border Services where I work, it’s an open and shut case. We will not hand over our data to anyone, period. That’s why I prefer a product like SchoonerSQL over the Cloud.

Rob Gravelle resides in Ottawa, Canada, and is the founder of GravelleConsulting.com. Rob has built systems for Intelligence-related organizations such as Canada Border Services, CSIS as well as for numerous commercial businesses. In his spare time, Rob has become an accomplished guitar player, and has released several CDs. His former band, Ivory Knight, was rated as one Canada’s top hard rock and metal groups by Brave Words magazine (issue #92).

See all articles by Rob Gravelle

 

Robert Gravelle
Robert Gravelle
Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles