Protecting Oracle Instance with Local Clustering


Marin Komadina

For a long time, corporations tried very hard to keep systems running under
all conditions. For many e-commerce and business applications, database
unavailability for an extended period leads to revenue loss. With a wide range
of solutions in use (local disk mirroring, RAID, local clustering, remote disk
mirroring, replication and local clustering with Oracle Parallel Server or Real
Application Clusters) we need to choose the most optimal solution. One of those
solutions is Local Clustering with Sun Cluster software.

This article covers:

  • Local
    Clustering Definition

  • HA
    (High Availability) Oracle Agent

  • Cluster
    Configuration

  • Procedure
    for Adding New Instance in Cluster

  • Conclusion

Local Clustering
Definition

Local cluster is defined as two or more physical machines (nodes) that
share common disk storage and logical IP address. Clustered nodes exchange
cluster information over heartbeat link(s). Cluster software collects information
and checks the situation on both nodes. On error condition, software will
execute a predefined script and switch the clustered services over to a
secondary machine. Oracle instance, as one of clustered services, will be
switched off together with listener process, and restarted on the secondary
(surviving) node.

HA Oracle Agent

HA
Oracle Agent software controls Oracle database activity on Sun Cluster nodes. The
agent performs fault checking using two processes on the local node and two
process on the remote node by querying V$SYSSTAT table for active sessions. If
the database has no active sessions, HA Agent will open a test transaction
(connect and execute in serial create, insert, update, drop table commands).
Return error codes from HA Agent have been validated against a special action
file on location.

/etc/opt/SUNWscor/haoracle_config_V1:


# Action file for HA-DBMS Oracle fault monitor
# State DBMS_er proc_di log_msg timeout int_err new_sta action message

co * * * * 1 *
stop Internal HA-DBMS Oracle error connecting to db
on 28 * * * * di
none Session killed by DBA, will reconnect
* 50 * * * * di
takeover O/S error occurred while obtaining an enqueue
co 0 * * 1 0 *
restart A timeout has occured during connect

Takeover
– cluster software will switch to another node.

Stop
– cluster will stop DBMS

None
– no action taken

Restart
– database restarted locally on the same node

HA
Oracle Agent requires Oracle configuration files (listener.ora, oratab and tnsnames.ora)
on unique predefined location /var/opt/oracle.

Latest Articles