Disaster Recovery and Big Data

Big data applications tend to have massive data storage capacity coupled with a hybrid hardware appliance and analytical software package used for data analytics. These applications are not typically used to process operational data; rather, users query the data to analyze past product sales, forecast trends, and determine future customer buying patterns. Big data applications are not usually considered mission-critical: while they support sales and marketing decisions, they do not significantly affect core operations such as customer accounts, orders, inventory, and shipping.

Why, then, are major IT organizations moving quickly to incorporating big data in their disaster recovery plans?  Isn’t this data too voluminous to back up? And, even if you back it up, won’t it take days (or weeks, or more!) to recover all the data from those backups?

Too Big to Back Up

Disaster recovery best practices include the ability to recover important data to a consistent point in time within a defined time period. This time period, called the recovery time objective, or RTO, must be short (a few hours at most) for the operational data upon which your business depends. But what about big data? 

Most companies contend that backup and recovery of big data is not important. Some of the reasons given are as follows.

Operational systems take priority. After a disaster, the highest priority is the recovery of data that supports operational systems.  These systems, including accounting, order entry, payment acceptance, payroll, and so forth are necessary to keep the company operating.  After that data is recovered, the second priority is to support execution of those systems.

Big data is not mission-critical.Forecasting and trending analyses may be important to marketing, but these analyses along with ad hoc queries and other user reports are based on historical data, not real-time data.

Big data is big! The volume of data stored to support a big data application can be several factors of ten larger than all of your operational data combined. This is because big data applications act on historical snapshots of data. Ten years of historical data corresponds to several thousand days of snapshots. On what media will it be backed up, how long will backups take, and how much backup storage will you need.

Backup and restore processes require I/O channel capacity. A large capacity is required in order to physically move vast amounts of data in a short time. Backup and restore will virtually monopolize your I/O channels; the only viable alternative is to install enough excess channel capacity to handle these processes.

Big Data Becomes Mission-Critical

The reasons outlined above may not apply to all companies. Some customer-focused systems use big data analytics, meaning that the big data application is part of operational processing. In other organizations big data began as a simple query and reporting tool. Over time some ad hoc queries were judged to be extremely useful and converted to regular reports. Such useful reports got noticed by management, who converted them into valuable actions. Eventually, management became dependent on these reports for making operational decisions. Thus, their big data application became mission-critical.

The evolution of your big data application to one that is mission-critical is inevitable. These applications are expensive, time-consuming to install and configure, and require highly-skilled technicians. In addition, business analysts that query the data rarely operate by themselves. They usually use a business analytics software package specially built to query and analyze big data. Such packages are expensive, and require lots of training to use effectively.

Bottom line: A lot of money is invested in your big data. Companies are highly motivated to get something of value for their investment. The reports that result from analyzing your data can lead to better customer service, faster product turnaround, and higher profits. And profits are mission-critical.

Backup Options

If you intend to recover all or a portion of your big data application as part of disaster recovery planning, there are several backup choices.

The most important thing to keep in mind is that big data is mostly historical and static. Snapshots of operational data are extracted into a staging area, cleaned and transformed, then loaded into a combination of an enterprise data warehouse and your big data application. After that, they are not updated. This means that backup processes only need to run once on each snapshot.

The most common backup methods include the following.

Data replication.This is a common backup approach. As data are loaded into the data warehouse or big data application they are simultaneously shipped to a backup process which loads a backup copy of the big data application. This commonly occurs at the disaster recovery site, which then has up-to-date data should a disaster occur.

Virtual snapshot.This is a hardware solution that allows the storage media to create a virtual backup of an entire system. Database writes are disabled for a short period, while the hardware managing the storage subsystem takes internal copies of all files. This copy process can be extremely fast, sometimes completing in seconds. After copying is complete, the database management system is allowed to resume write operations.

Snapshots provide extremely fast recovery times, assuming that recovery to the point-in-time the snapshot was created is desirable. Beyond that, recovery to points other than that of snapshot creation require some method of applying all recent database changes (captured in logs) to the snapshot. Another issue is storage capacity. A snapshot has the potential to require double the currently used storage. And, if a disaster occurs and the snapshot is now used as current data, yet another snapshot area must be allocated in the case of a further disaster.

Local and remote copies.This is a classic method that consists of disk and tape backups of either physical disk drives or databases. Database administrators (DBAs) execute vendor utilities that access the data, which is usually stored in a compressed, proprietary format. Such backups execute quickly and, being in internal database format, can be loaded quickly.

Recovery Automation and Testing

Another important part of disaster planning is ensuring that recovery can be completed within the RTO. For big data, this usually means automating the recovery using standard processes or even vendor tools. The smart DBA automates as much as possible, in order to minimize relatively slow human intervention. This includes avoiding:

  • Manual processing of backup storage (e.g., tape movement and handling);
  • Typing commands;
  • Reviewing paper reports or documentation.

With recovery automated, practice it, and practice it regularly. Remember, big data is always growing, and as data volumes increase so too do backup and recovery times.


Big data applications cost time, money, and resources to install and use. Companies are motivated to obtain a return on such a large investment. Queries and reports provide valuable insights that lead to actions, changes, and profits. Eventually your big data application becomes mission-critical. Before that happens, ensure that your IT infrastructure can backup and recover that data.


IBM — DB2 Recovery Expert for z/OS User Scenarios

IBM — Application Recovery Tool for IMS and DB2 Databases A Data Recovery Guide

IBM — System z Mean Time to Recovery Best Practices

IBM — Backup and Recovery I/O Related Performance Consideration

See all articles by Lockwood Lyon

Lockwood Lyon
Lockwood Lyon
Lockwood Lyon is a systems and database performance specialist. He has more than 20 years of experience in IT as a database administrator, systems analyst, manager, and consultant. Most recently, he has spent time on DB2 subsystem installation and performance tuning. He is also the author of The MIS Manager's Guide to Performance Appraisal (McGraw-Hill, 1993).

Latest Articles