Disaster Recovery and the Big Data Application

When I ask database administrators how they implemented disaster recovery in their big data environments there are two typical responses:

  • Big data is for analytics, not mission-critical data, so DR plans are not necessary;
  • Big data is too … big! Backups will take up a lot of space and recovery will take far too long for such large data sets.

Despite this reasoning, a disaster recovery plan for your big data implementation may be essential for your company’s future.

Mission-Critical Data

It is true that most big data implementations are for data analytics and reporting, while business and customer transactions are handled by legacy systems. However, we must take into account that big data applications are relatively new. They have not yet had the time to grow in priority and importance.  Consider whether the following progression could happen in your organization:

1.    Your company implements a big data solution, most likely using a big data appliance.

2.    Business analysts and users begin querying data and analyzing the results; queries run quickly, so analysts flood the application;

3.    Some queries provide decisionable information, allowing your company to reduce costs, set competitive prices, reduce time-to-market, and so forth.

4.    As more queries produce actionable results, management sees the value of the big data application and approves additional big data implementations and analytics.

5.    Valuable queries that were initially run once are now run regularly; weekly, even daily. These valuable daily reports are distributed widely.

6.    The number of valuable reports increases; management now designates their big data solutions as mission-critical.

Big data implementations that produce value drive lower costs and higher profits. Hence, at some point you must implement a disaster recovery plan. And, since this requirement may come with little warning, the database administrator and other support staff should take proactive steps during the first big data implementation.

Review storage needs, network capacity, hardware capabilities and software license requirements at the beginning of your implementation. Have this data published and available to management before it becomes critical. This allows your enterprise to budget and plan for its needs in advance.

Both application designers and database administrators sometimes take the simplistic view that regular backups of application data are sufficient for any recovery needs. The strategy of weekend backups can easily backfire!  Backup methods that meet the application’s and enterprise’s needs start with a sound recovery strategy.  Further, this strategy must be applied from the beginning, starting with the big data database and application design.

Two factors drive which recovery options are used for a big data application:

  • The recovery time objective (RTO) — During a recovery scenario, how long can the application data (or portions of the data) be unavailable?
  • The recovery point objective (RPO) — During a recovery scenario, to what point must data be recovered?  To a specific date/time? To the end of the most recently completed transaction?

For a big data implementation, the choice of recovery point is straightforward. The most common situation is a period of extract, transform, and load (ETL) of operational data from legacy systems into the big data store, followed by multiple analytical applications that query the data. The most commonly chosen recovery point is immediately after loading is complete.

Backup and recovery strategies are driven by this choice. For example, if the preferred method of backup is database image copies, these can be scheduled to begin at the time of the recovery point. These backups will not interfere with applications because analytics involves querying, not updating. Of course, the database administrator must ensure that all backups complete within a reasonable time; backups taking more than 24 hours will interfere with the next day’s processing.

Recovery time requirements are also easily defined. The recovery process must have data available for analytics within about 24 hours time. Any longer, and the recovery site may not be able to catch up with the additional daily operational data that must now be loaded.

Database administrators should elicit basic recovery time and recovery point objectives for any big data implementation as early as possible. Then they should review backup and recovery options, choose methods and procedures that meet the objectives, and document the results for future reference. As applications mature and the enterprise big data store grows, the designation of your big data as mission-critical won’t catch you unprepared.

Big Data Means a Lot of Data

Another common objection to doing disaster recovery planning for big data is the sheer size of the data store. Infrastructure staff believe that such a huge volume of data will take forever to back up, forever to recover, and take up immense quantities of backup storage. Luckily, several recent technical advances in hardware can mitigate these worries.

Hardware Mirroring

Most modern disk storage equipment has an optional disk mirroring facility. Mirroring is a process where changes on one disk drive are made to a corresponding dive in another location. This allows the support staff to implement disk copying and backup, data replication, and publish-subscribe applications using available hardware features without the need to code applications.

For backup and recovery purposes, the storage administrator designates a set of disk or a disk array to be the mirror (or backup) of another. The primary disks can be those of the big data store, with an array of backup disks in a secure location used as the mirrors. When the primary disks are updated during the ETL process the hardware automatically makes those updates on the mirrors.  In case of a disaster, the storage administrator defines the mirror disks to the operating system as the primary ones.  Suddenly, the data is restored and the application is available.

Proprietary Hardware

One way to implement a big data solution is with special-purpose hardware sometimes called an appliance. One example of this is the IBM DB2 Analytics Accelerator (IDAA). This is a combination hardware and software solution. The main hardware consists of a multi-terabyte disk array directly attached to the IBM mainframe processor. In addition to disk storage the IDAA includes a large memory store and parallel processing and network hardware. Finally, it provides an enhancement to the IBM DB2 database management software to be able to access the data store.

Your big data implementation consists of loading selected DB2 tables into the IDAA disk array. DB2 then determines at query execution time whether the data required is stored in the IDAA. If so, the query executes against the data in the appliance at a very high speed.

For disaster recovery purposes the database administrator can make an additional appliance available at the DR site. As tables are loaded with data at your primary site, copies of the data can be shipped to the DR site and loaded into the appliance there. If a disaster occurs, your data is already available at the disaster recovery site.


As we have seen, it is possible that in the near future your IT management may designate your big data implementation as critical to the organization, requiring it to be available should a disaster occur. Rather than wait until this happens, do your disaster recovery planning now. Being proactive will greatly reduce the time and resources you will need to design, test and implement a big data disaster recovery solution.


IBM — Application Recovery Tool for IMS and DB2 Databases A Data Recovery Guide

IBM — DB2 Recovery Expert for z/OS User Scenarios

IBM — System z Mean Time to Recovery Best Practices

IBM — High Availability and Disaster Recovery Options for DB2 on Linux, UNIX, and Windows

See all articles by Lockwood Lyon

Lockwood Lyon
Lockwood Lyon
Lockwood Lyon is a systems and database performance specialist. He has more than 20 years of experience in IT as a database administrator, systems analyst, manager, and consultant. Most recently, he has spent time on DB2 subsystem installation and performance tuning. He is also the author of The MIS Manager's Guide to Performance Appraisal (McGraw-Hill, 1993).

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles