Protecting Oracle Instance with Local Clustering

March 13, 2003

Marin Komadina

For a long time, corporations tried very hard to keep systems running under all conditions. For many e-commerce and business applications, database unavailability for an extended period leads to revenue loss. With a wide range of solutions in use (local disk mirroring, RAID, local clustering, remote disk mirroring, replication and local clustering with Oracle Parallel Server or Real Application Clusters) we need to choose the most optimal solution. One of those solutions is Local Clustering with Sun Cluster software.

This article covers:

  • Local Clustering Definition
  • HA (High Availability) Oracle Agent
  • Cluster Configuration
  • Procedure for Adding New Instance in Cluster
  • Conclusion

Local Clustering Definition

Local cluster is defined as two or more physical machines (nodes) that share common disk storage and logical IP address. Clustered nodes exchange cluster information over heartbeat link(s). Cluster software collects information and checks the situation on both nodes. On error condition, software will execute a predefined script and switch the clustered services over to a secondary machine. Oracle instance, as one of clustered services, will be switched off together with listener process, and restarted on the secondary (surviving) node.

HA Oracle Agent

HA Oracle Agent software controls Oracle database activity on Sun Cluster nodes. The agent performs fault checking using two processes on the local node and two process on the remote node by querying V$SYSSTAT table for active sessions. If the database has no active sessions, HA Agent will open a test transaction (connect and execute in serial create, insert, update, drop table commands). Return error codes from HA Agent have been validated against a special action file on location.


# Action file for HA-DBMS Oracle fault monitor
# State DBMS_er proc_di log_msg timeout int_err new_sta action  message
co      *       *       *       *       1       *       
  stop    Internal HA-DBMS Oracle error connecting to db 
on      28      *       *       *       *       di      
  none    Session killed by DBA, will reconnect
*       50      *       *       *       *       di      
  takeover  O/S error occurred while obtaining an enqueue
co      0       *       *       1       0       *       
  restart A timeout has occured during connect

Takeover - cluster software will switch to another node.

Stop - cluster will stop DBMS

None - no action taken

Restart - database restarted locally on the same node

HA Oracle Agent requires Oracle configuration files (listener.ora, oratab and tnsnames.ora) on unique predefined location /var/opt/oracle.