For a long time, corporations tried very hard to keep systems running under
all conditions. For many e-commerce and business applications, database
unavailability for an extended period leads to revenue loss. With a wide range
of solutions in use (local disk mirroring, RAID, local clustering, remote disk
mirroring, replication and local clustering with Oracle Parallel Server or Real
Application Clusters) we need to choose the most optimal solution. One of those
solutions is Local Clustering with Sun Cluster software.
This article covers:
-
Local
Clustering Definition -
HA
(High Availability) Oracle Agent -
Cluster
Configuration -
Procedure
for Adding New Instance in Cluster - Conclusion
Local Clustering
Definition
Local cluster is defined as two or more physical machines (nodes) that
share common disk storage and logical IP address. Clustered nodes exchange
cluster information over heartbeat link(s). Cluster software collects information
and checks the situation on both nodes. On error condition, software will
execute a predefined script and switch the clustered services over to a
secondary machine. Oracle instance, as one of clustered services, will be
switched off together with listener process, and restarted on the secondary
(surviving) node.
HA Oracle Agent
HA
Oracle Agent software controls Oracle database activity on Sun Cluster nodes. The
agent performs fault checking using two processes on the local node and two
process on the remote node by querying V$SYSSTAT table for active sessions. If
the database has no active sessions, HA Agent will open a test transaction
(connect and execute in serial create, insert, update, drop table commands).
Return error codes from HA Agent have been validated against a special action
file on location.
/etc/opt/SUNWscor/haoracle_config_V1:
# Action file for HA-DBMS Oracle fault monitor
# State DBMS_er proc_di log_msg timeout int_err new_sta action message
—
co * * * * 1 *
stop Internal HA-DBMS Oracle error connecting to db
on 28 * * * * di
none Session killed by DBA, will reconnect
* 50 * * * * di
takeover O/S error occurred while obtaining an enqueue
co 0 * * 1 0 *
restart A timeout has occured during connect
—
Takeover
– cluster software will switch to another node.
Stop
– cluster will stop DBMS
None
– no action taken
Restart
– database restarted locally on the same node
HA
Oracle Agent requires Oracle configuration files (listener.ora, oratab and tnsnames.ora)
on unique predefined location /var/opt/oracle.