DB2 for z/OS Database Recovery - Get it Right
January 14, 2011
Database backup copies may not be enough to ensure data recoverability. The DBA must "bake" recoverability into the database design process; additionally, for current systems, the DBA needs to know whether current backup processes support recovery procedures that will meet application recovery objectives.
The Laws of Database Administration
In order of their importance, the laws include recoverability, availability, security, and performance. The easiest way to explain the laws is in the negative sense; that is, what you shouldn't do. In this form, the laws can be stated as "Thou shalt not cause thy …
The first three laws deal with data management, the fourth with system management. These encapsulate the four most important responsibilities of the DBA.
The Order of Importance
Some might quibble with the order of the laws. For example, one manager told me: "In our shop, performance is of highest importance. We have service level agreements that we must meet in order to meet our customer's needs. Performance is more important than backup and recovery."
Performance is certainly an urgent concern, but recoverability remains the most important. To determine what's really important, consider this anecdote:
You are a DBA supporting IT systems for a provider of medical and surgical services. Your systems are used daily by doctors, nurses, and technicians to provide services to patients. In some cases, these services (e.g., diagnosis) may involve life-or-death decisions. Your supervisor asks you to give a presentation to upper management (including the CIO, vice president of IT, and major stockholders or owners) of new enhancements your department will make. Your presentation begins as follows:
"Ladies and gentlemen, the DBA team will be implementing some high-performance features in the near future. We have two implementation plans and we'd like your assistance in choosing between the two.
"Plan A will involve performance changes that will result in up to 95 percent of our online transactions finishing in their required service levels." You hear some grumbling from the audience, as they realize this means 5 percent of the online transactions will perform poorly. You continue.
"Plan B will involve performance changes resulting in 100 percent of our online transactions finishing in their required service levels." You hear sighs of relief from your audience, as they are clearly more comfortable with this plan.
"However," you continue, "if we have a major hardware outage, there's a good possibility that up to 5 percent of our data will be missing or invalid."
Your audience now sits in stunned silence. They realize there could be a power outage or other disaster that affects the IT systems. Should this occur, when the systems come back up, every user will know there's a 5 percent chance that test results are missing, diagnoses are incorrect, or that some patient's records will have completely disappeared.
Which plan will your audience adopt? Which was more important to them: recoverability or performance?
In the remainder of this article I will concentrate on the first law: data recoverability.
Law #1: Data Recoverability
Ensure data recoverability. If there's one thing to get right, this is it. While other things (such as performance or security) may seem more urgent, ensuring data recoverability is the database administrator's most important responsibility.
Consider a project to implement a new database to support a critical production application. If a disaster occurs, will the data be available in the agreed-upon Recovery Time Objective (RTO)? If not, in the case of medical and financial data, this may breach contracts with vendors or violate audit guidelines.
Data recoverability is another major consideration in some legislation. Here are some that affect financial institutions:
Most IT shops use regularly scheduled standard backup procedures (e.g., DB2 image copies), but few have actually tested the recovery time of these objects and analyzed whether their backup procedures are sufficient (or necessary) for their recovery requirements. One common mistake is forgetting that tablespace recovery using an image copy must also include rebuilds of indexes (which take additional time).
The simplest way for the DBA to proceed is to ensure that recovery requirements are documented during the requirements phase of systems design. For existing systems, document the recovery requirements either immediately, during the next phase of maintenance or enhancements, or during the next audit. These documents can then be used to drive one or more internal projects to measure application data recoverability, compare against recovery objectives, and implement improvements.
Best practices for recoverability start with recovery, not with backups. The DBA should not simply implement daily or weekly backups (image copies) as a standard; instead, list possible recovery methods and options, their costs, and their speeds. This can then be used during design or enhancement projects to ensure recovery requirements are met.
Some of these methods include:
DBAs should ensure they have all of the following:
Automation of the Recovery Process
In general, the DBA should automate reactive or simple reporting processes, freeing them for higher-level work. Your first reaction might be, "Wait! I'll automate myself out of a job!" Far from it. Implementing automation makes the DBA more valuable. IT management wants its knowledge workers doing tasks that add value. These might include detailed systems performance tuning, quality control, cost/benefit reviews of potential new applications and projects, and more. Management understands that a DBA spending time on trivial tasks represents a net loss of productivity.
The advantage of automation isn't merely speed; automating tasks helps move the DBA away from reactive tasks such as reporting and analysis toward more proactive functions.
Here's a typical list of processes many DBAs still manually perform:
Each of these tasks can be replaced by an automated reporting or a data gathering process of some kind. With such processes in place, DBAs now can schedule data gathering and report generation for later analysis, or guide requestors to the appropriate screens, reports or jobs. This removes the DBA from the "reactive rut" and generates time for proactive tasks such as projects, architecture, planning, systems tuning, and recovery planning.
Along with choosing specific tasks to automate, you'll probably need to learn one or more automation tools or languages. REXX is an example of a popular language for online or batch access to DB2 data. There are many examples and ideas for automated processes in articles, presentations, and white papers.
Autonomics in the Recovery Process
As our IT organizations have matured, we've become smarter about our problems. We began to collect problem logs, and analyzed them looking for trends and patterns. We began to recognize frequent problems and devised strategies for automatically dealing with them or preventing them.
We've now reached the next logical step in this progression: engineering processes and process control to make systems and applications self-aware and self-healing. This is called autonomics. Autonomics, ranging from simple scripts to complicated processes, can be applied to applications, systems, or support software. For many DBAs, the idea of a self-healing database inspires visions of the database redesigning itself.
What exactly would a self-healing database heal? One DB2 z/OS example that comes to mind is real-time statistics. DB2 dynamically generates these data distribution statistics. An example of their use is during reorg utility execution, where they can be queried and the results used to decide whether or not to execute the reorg.
These and other examples of DB2 autonomics make it possible to "program in" a manner of self-tuning (or at least self-management) into the DBA's support infrastructure.
The next logical step is implementing autonomics into the recovery process. The DBA can use real-time statistics to decide when to take image copies, or what kind of copies (full or incremental).
In implementing best practices for database recovery, the DBA needs to move away from reactive tasks, initiate quality measures, and offload basic, repetitive tasks. I hope that you can use this material as you work to become more productive. Remember to quantify and document your results, and then advertise your value.
Application Recovery Tool for IMS and DB2 Databases A Data Recovery Guide