Free Newsletters:
DatabaseDaily  
Database Journal
Search Database Journal:
 
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum DBA Videos
internet.com

» Database Journal Home
» DBA Videos
» Database Articles
» Database Tutorials
MS SQL
Oracle
MS Access
MySQL
DB2
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» DBA Jobs
» Sitemap

News Via RSS Feed



follow us on Twitter

Marketplace Partners
Be a Marketplace Partner

internet.commerce
Be a Commerce Partner


















Mariposa Bot Shipped With Vodafone Smartphone

IT Job Market Heating Up: Report

Bing Makes Strides But Momentum Stalls

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Database Journal | DBA Support | SQLCourse | SQLCourse2







Service Release Technical Architect Sr (PA)
Next Step Systems
US-PA-Philadelphia

Justtechjobs.com Post A Job | Post A Resume

Featured Database Articles

Oracle

April 22, 2003

Becoming the Master of Disaster

By Jim Czuprynski

A few Saturdays ago, I performed a planned viability test of my Oracle 9iR2 hot standby database. I terminated the transmission of archived redo logs from the primary site, activated the standby database, and compared results between primary and standby sites. As expected, row counts, dollar totals, and a few other measures matched up perfectly. Satisfied, I kicked off the process to copy RMAN backups from the primary site in preparation for restoring the standby site to its standby role, and went home until Monday, leaving the newly activated database running over the weekend.

When I arrived early on Monday to run a few more tests of our applications against the standby site, I was surprised to discover the instance had crashed. After investigation, I found out that all of the drives had failed on one of the standby site's two disk drive arrays. Since that array held drives that contained datafiles for the system rollback segments, the rollback segment tablespace was corrupted almost immediately. Further investigation revealed that the disk array had failed because the array had only one power supply, even though a second redundant power supply module could have been installed.

Even though this was a rather unexpected and reasonably unlikely failure, it could not have come at a better time. It caused me to review our entire disaster recovery plan for both the secondary and primary servers. I found out that none of the production servers had been outfitted with redundant power supplies for the disk arrays. And some further reevaluation of my disaster recovery scenarios proved that the loss of one of the arrays would have caused the loss of UNDO segments on the production database - because it turns out those datafiles weren't mirrored properly either.

Both the primary and standby sites are all repaired now, of course, and everything is copasetic. However, my cautionary tale underlines how a robust disaster recovery plan can be critical in preventing and surviving a potential disaster.

Developing disaster recovery scenarios.
A good disaster recovery planner isn't afraid to "think about the unthinkable." This entails developing the common disaster recovery scenarios that could happen to your database and server.

Based on my experiences over the past several years as an Oracle DBA, the most serious of these is media failure. A typical example of preventable media failure involves under-utilization of RAID-0+1 or RAID-1 redundancy for critical data files, log groups, and control files. Moreover, as I described in my earlier tale of woe, it is a good idea to remember those pesky and often-overlooked UNDO or rollback segments - it may be impossible to restart the database when those tablespaces are damaged or corrupted due to media failure.

Another set of disaster recovery scenarios with serious implications involves the partial or complete loss of the database server itself. This might include damage to the software needed to run the Oracle instance - for example, the loss of critical operating system files - as well as physical damage, such as a failed power supply, memory, or CPU module.

Hardware disasters can be more difficult to predict, and can be even harder to test, since realistically a "test to destruction" of the hardware might have to be performed to simulate some of the failures. However, even with robust modern service agreements available from major hardware suppliers, it could be hours or even days before the damaged server is repaired and ready to take the load of a production database again, so these scenarios should not be ignored.

Once you've uncovered potential single points of failure and have painted some grim pictures as to what might happen if those failures occurred, it's time to turn attention to the methods, practices, and hardware configurations that help prevent a disaster.

Alternate production server.
If you are using Oracle's DataGuard facilities to create and maintain either a logical or physical "hot standby" server site, then you've already got this angle covered. However, if you do not have an alternate server to which you could quickly restore your production database, the ability to recover from a serious hardware disaster will be much more in doubt.

One less robust alternative to a standby site is a quality-assurance (QA) database server. This server should ideally be a close match to the hardware for the production site to allow evaluation of the next set of application or database changes about to be released to production. On one occasion before getting our hot standby server in working order, I was forced to transfer our production database over to our QA site because we had noticed some "flaky" performance of the production server. As it turned out, we had guessed right - the production server's motherboard was facing an imminent failure, and failed shortly after the transfer of responsibility. Though the QA server had only half the memory and CPU power of the production site, having a QA server in my "back pocket" saved the day.



Go to page: 1  2  Next  

Tools:
Add databasejournal.com to your favorites
Add databasejournal.com to your browser search box
IE 7 | Firefox 2.0 | Firefox 1.5.x
Receive news via our XML/RSS feed

Oracle Archives








Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Free ERD program with import option ? tenamatt 1 February 26th, 01:39 PM
Record Select inet 1 February 26th, 01:38 PM
Rownum inet 1 February 18th, 10:01 AM
UPDATE statement performance sacrsv 2 January 19th, 07:58 PM









The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers