Performing Database Failover with Oracle 11g Data Guard
September 24, 2009
Synopsis. Data Guards capability to recover a mission-critical database upon the loss of the entire production site is at the heart of Oracle 11gs disaster recovery features. This article the sixth in this ongoing series explores how to manually fail over a production database to its corresponding physical standby database, as well as reinstate a failed primary database to a physical standby.
The prior article in this series demonstrated:
This article will show how to:
Pondering the Unthinkable: Failover
I dont think Ive ever met an Oracle DBA whos been thrilled to discuss the possible loss of her complete production database site, even if she had simulated that loss and practiced the required steps to full database recovery. But lets face it: Even if a DBA follows Oracles recommended best practices precisely to guard against data loss, theres always a miniscule chance that the production database site could become untenable through no fault of our own. Thankfully, once an Oracle 11g Data Guard standby environment has been enabled, Oracle 11g offers excellent survivability of a production database while simultaneously insuring that any data loss is limited depending on the data protection mode selected.
It Cant Hurt To Ask... Though it might seem to be inappropriate at the time, the first question that Ill typically ask in an apparent failover situation is Do we really need to perform a failover in this situation? There may be a way around the apparent loss of the primary database, or its possible that the apparent disaster is really not that disastrous. For example, if the production database is unavailable for only a few moments, or maybe even an hour, during off-peak business hours, is it really a problem, or just a matter of perception and extra paperwork? The reason Im not afraid to challenge the necessity of failover is that there are some potentially serious consequences of completing the failover operation:
Detecting a Failover Situation. Once Ive established that its truly necessary to continue the failover operation, Ill use Data Guard Broker (DGB) to assist in the transition. To illustrate how DGB detects a failover situation, Ill start with the ORCL_PRIMARY and ORCL_STDBY1 databases both fulfilling their initial intended respective roles of primary database and counterpart physical standby database. The DGMGRL utility reflects this as well, and it shows that everything is copasetic:
[oracle@11gStdby ~]$ dgmgrl DGMGRL for Linux: Version 18.104.22.168.0 - Production Copyright (c) 2000, 2005, Oracle. All rights reserved. Welcome to DGMGRL, type "help" for information. DGMGRL> connect sys/oracle Connected. DGMGRL> show configuration Configuration Name: MAA_orcl Enabled: YES Protection Mode: MaxPerformance Databases: orcl_primary - Primary database orcl_stdby1 - Physical standby database Fast-Start Failover: DISABLED Current status for "MAA_orcl": SUCCESS
To simulate the necessity for a failover to the physical standby database, Ill simply issue the kill -9 <pid> command against the Server Monitor (SMON) background process for the primary database. The primary databases alert log detects this almost immediately, of course:
. . . Tue Aug 25 18:54:10 2009 Errors in file /u01/app/oracle/diag/rdbms/orcl_primary/orcl_primary/trace/orcl_primary_pmon_6166.trc: ORA-00474: SMON process terminated with error PMON (ospid: 6166): terminating the instance due to error 474 Instance terminated by PMON, pid = 6166 . . .
The corresponding physical standby databases Remote File Server (RFS) background process also detects the loss of connectivity to the primary database, as reflected in that databases alert log:
. . . Tue Aug 25 18:54:49 2009 RFS: Possible network disconnect with primary database Tue Aug 25 18:54:49 2009 RFS: Possible network disconnect with primary database Tue Aug 25 18:55:49 2009 . . .
To ascertain whats wrong here, Ill immediately verify the Data Guard configuration by connecting via DGMGRL and run the SHOW CONFIGURATION and SHOW DATABASE commands:
DGMGRL> show configuration verbose; Configuration Name: MAA_orcl Enabled: YES Protection Mode: MaxPerformance Databases: orcl_primary - Primary database orcl_stdby1 - Physical standby database Fast-Start Failover: DISABLED Current status for "MAA_orcl": Error: ORA-16625: cannot reach the database
Failing Over to the Physical Standby. Its obvious that something is seriously wrong here! Once Ive asked my usual probing questions and determined that the failover is indeed necessary, Ill start the failover process with one simple DGB command, FAILOVER TO <standby database>:
DGMGRL> failover to orcl_stdby1; Performing failover NOW, please wait... Failover succeeded, new primary is "orcl_stdby1" DGMGRL> show configuration Configuration Name: MAA_orcl Enabled: YES Protection Mode: MaxPerformance Databases: orcl_stdby1 - Primary database orcl_primary - Physical standby database (disabled) Fast-Start Failover: DISABLED Current status for "MAA_orcl": SUCCESS DGMGRL> show database orcl_stdby1 Database Name: orcl_stdby1 Role: PRIMARY Enabled: YES Intended State: TRANSPORT-ON Instance(s): orcl_stdby1 Current status for "orcl_stdby1": SUCCESS
During the failover to the physical standby database, the Oracle 11g DGB performs the following steps:
The corresponding alert log entries from the original standby database (now the primary database) are displayed in Listing 6.1.
Have Any Data Been Lost? As with many questions were asked as DBAs, the answer to this one is It depends. In this case, its going to depend on which data protection mode either Maximum Performance, Maximum Availability, or Maximum Protection - that Ive chosen for the Data Guard configuration. (See my prior article on this topic for a comparison on these data protection options and their corresponding effect on data loss.)
However, theres one other factor to consider in a failover situation. The Data Guard FAILOVER TO <standby database> command also offers an optional directive, IMMEDIATE, which instructs DGB to an immediate failover instead of a complete failover, and here are the implications:
Failing Over To a Logical Standby Database: Considerations. Though its definitely possible, Id like to avoid using a logical standby database as a failover target:
Ill explore these limitations (but also highlight the features) of logical standby databases in an upcoming article in this ongoing series.
Reinstatement: Post-Failover Stress Relief
Prior to Oracle Database Release 10g, once the primary database had failed, it was impossible to bring the original primary database back into the Data Guard configuration as a new physical standby database without having to rebuild it completely. This was usually accomplished by using the new primary database as the target database and the original primary as the auxiliary database within the bounds of the DUPLICATE DATABASE Recovery Manager (RMAN) command set.
As long as certain prerequisites are in place, however, Oracle 10g offered the ability to reinstate the original primary database as a physical standby database:
Reinstating a Failed Primary Database. To reinstate my failed ORCL_PRIMARY database as a new physical standby database, the original primary database has to be shut down and then brought back to MOUNT state. Otherwise, the initial attempt at reinstatement will fail as shown below in this DGMGRL session:
DGMGRL> reinstate database orcl_primary Reinstating database "orcl_primary", please wait... Error: ORA-16653: failed to reinstate database Failed. Reinstatement of database "orcl_primary" failed
After connecting to the ORCL_PRIMARY database via SQL*Plus and issuing the STARTUP MOUNT command, Ill again attempt to reinstate the database:
DGMGRL> reinstate database orcl_primary Reinstating database "orcl_primary", please wait... Operation requires shutdown of instance "orcl_primary" on database "orcl_primary" Shutting down instance "orcl_primary"... ORA-01109: database not open Database dismounted. ORACLE instance shut down. Operation requires startup of instance "orcl_primary" on database "orcl_primary" Starting instance "orcl_primary"... Reinstating database "orcl_primary", please wait... Reinstatement of database "orcl_primary" succeeded
The successful reinstatement of ORCL_PRIMARY as a physical standby database is reflected in the Data Guard configuration as well:
DGMGRL> show configuration Configuration Name: MAA_orcl Enabled: YES Protection Mode: MaxPerformance Databases: orcl_stdby1 - Primary database orcl_primary - Physical standby database Fast-Start Failover: DISABLED Current status for "MAA_orcl": SUCCESS DGMGRL> show database orcl_primary Database Name: orcl_primary Role: PHYSICAL STANDBY Enabled: YES Intended State: APPLY-ON Instance(s): orcl_primary Current status for "orcl_primary": SUCCESS
The corresponding alert log entries from the successful reinstatement process for the ORCL_PRIMARY database are shown in Listing 6.2.
Reverting to the Original Data Guard Configuration. Now that Ive successfully reinstated my original primary database to a physical standby database, Ill simply perform a switchover operation via DGB to return the original primary and physical standby databases to their initial roles in the Data Guard configuration:
DGMGRL> switchover to orcl_primary; Performing switchover NOW, please wait... New primary database "orcl_primary" is opening... Operation requires shutdown of instance "orcl_stdby1" on database "orcl_stdby1" Shutting down instance "orcl_stdby1"... ORA-01109: database not open Database dismounted. ORACLE instance shut down. Operation requires startup of instance "orcl_stdby1" on database "orcl_stdby1" Starting instance "orcl_stdby1"... ORACLE instance started. Database mounted. Switchover succeeded, new primary is "orcl_primary" DGMGRL>
In the next article in this series, Ill explore
References and Additional Reading
While Im hopeful that Ive given you a thorough grounding in the technical aspects of the features Ive discussed in this article, Im also sure that there may be better documentation available since its been published. I therefore strongly suggest that you take a close look at the corresponding Oracle documentation on these features to obtain crystal-clear understanding before attempting to implement them in a production environment. Please note that Ive drawn upon the following Oracle Database 11g documentation for the deeper technical details of this article:
B28279-02 Oracle Database 11g New Features Guide
B28294-03 Oracle Database 11g Data Guard Concepts and Administration
B28295-03 Oracle Database 11g Data Guard Broker
B28320-01 Oracle Database 11g Reference Guide
B28419-02 Oracle Database 11g PL/SQL Packages and Types Reference