Oracle RAC Administration – Part 4: Administering the Clusterware: Components

Brief intro

I have a Google alert running for Oracle RAC and besides
pulling the regular vendor offerings, I receive a lot of alerts on how often Oracle
RAC is being adopted into the enterprise. I mentioned commoditization in my previous
article
, but there are a lot of pressures and several odd and painful reminders
that RAC needs a good administrator; and still things can go wrong and they
can go very badly wrong. RAC is no longer exclusive for huge
data- centers; it is being deployed in SMB environments as well. Since the need
to administer the database will require in-house expertise, it is becoming increasingly
important that we practice the installation and administration of our Oracle
RAC on our VMware Server /and or ESX test bed."

So, let’s pick up where we left off in our previous article
on Clusterware administration.

Administering the OCR Using OCR Backup Files

We will take a quick look at two methods described for
copying the Oracle Cluster Registry (OCR) and recovering it. Oracle
Clusterware automatically creates OCR backups every four hours and it always
retains the last three backup copies of the OCR. The CRSD process
that creates the backups also creates and retains an OCR backup for each full
day and then at the end of a week a complete backup for the week. So
there is a robust backup taking place in the background. And you guessed it right;
you cannot alter the backup frequencies. This is meant to protect you, the DBA,
so that you can copy these generated backup files at least once daily to a
different device from where the primary OCR resides. These files are located at
%CRS_home/cdata/my_cluster.

Restoring the OCR from generated OCR Backups

Given that most of us run our Oracle RAC on limited hardware,
on a VMware Server or ESX Server, it is no surprise to see applications
failing. Always try to restart the application first. To verify the failure run
an ocrcheck. The next step is to fix the problem.

On Unix/Linux Systems

Lets do the following to restore our OCR on Unix/Linux
Systems.

  • To show the backups, type the commands ocrconfig –showbackup

  • Check the contents by doing ocrdump -backupfile my_file

  • Go to bin and stop the CRS.
    crs stop on all nodes.

  • Perform the restore ocrconfig
    –restore my_file

  • Restart the nodes crs
    start

  • We have spoken and seen the CVU (Cluster Verification Utility)
    play a crucial role during installation in our RAC on VMware Series. Check the
    OCR’s integrity. Get a verbose output of all of the nodes by doing this: cluvfy comp ocr –n all -verbose

On Windows Systems

  • Do the same as above. Check the OCR backups using the ocrconfig -showbackup command. Verify
    the contents of the backup using ocrdump
    -backupfile
    my_file where my_name
    is your backup file.

  • Disable the OCR clients on all nodes by stopping the following
    services from the Service Control Panel: OracleClusterVolumeService,
    OracleCSService, OracleCRService, and the OracleEVMService.

  • Restore the OCR backup file with the following command ocrconfig -restore mfile name
    command. Always check to see if the OCR devices exist!

  • Start all of the services. Restart all of the nodes to bring the
    cluster alive.

  • To check the integrity, do the following with the CVU: cluvfy comp ocr -n all -verbose

Overruling the OCR (Oracle Cluster Registry) Data Loss Protection Machinery

Oracle Clusterware is robustly built and allows for minimal
error. An overwrite can throw RAC out of balance. If your OCR cannot access its
mirrored files and for some reason is not able to verify the location of the
OCR files (It could be anything, a temporary bottleneck in your SAN Virtual
Disks or local shared disks which you chose particularly for your OCR, in any
case some temporary glitch), then your OCR prevents further modification to
the available OCR. The data protection mechanism prohibits
the Clusterware from starting on the node where you have your OCR; Oracle throws
an error on your Enterprise Manager and Clusterware alert log files. If the
problem persists in just one node, (all that information is displayed neatly in
your Enterprise Manager and Clusterware log files errors–Error messages like CLSD-1009
or CLSD-1011), try to restart the node(s).

If that does not work and you cannot repair the OCR, then you
are left with no other option except overriding the protection mechanism. Do
not use it in the first instance! Oracle CRS is robust enough to check and poll
the files appropriately. Be warned that data loss may occur (and here I mean
that the OCR updates will be lost from the time of your last known successful
update. So if you are attempting to make changes to configuration using
the following command: ocrconfig –overwrite, then the last good known
configuration will be lost.

How to Override:

  • Check and compare the error message output with the Windows
    registry OR ocr.loc on Unix/Linux. If they don’t match then try to repair using
    ocrconfig –repair.

  • Use OCRDUMP (we
    will look more into OCRDUMP later in our Administration series) command to dump
    all information regarding the OCR configuration and check if the updates are
    latest.

  • If you can’t resolve the error messages (CLSD) then do the
    following: ocrconfig -overwrite
    to bring the node back to life.

Conclusion:

We have taken
a quick look at the Clusterware’s administration. We also took at look at the
override possibilities to force restore the OCR files when the OCR’s built in
protection mechanism prohibits the automatic restore of the same. Future
articles will go into more hands-on training. I have read the Oracle manual
several times and have quoted it often. I advise you to go through the manual
more than once. There is different documentation on RAC, even books, but
nothing comes close to the Oracle Documentation–and Oracle just made that freely
available now
! So go ahead , download the PDF books, get the free VMware Server (or Trial ESX 3.0 as its called Virtual
Infrastructure 3
) , ask your boss for an old server and go do some magic
with VMware!

»


See All Articles by Columnist
Tarry Singh

Tarry Singh
Tarry Singh
I have been active in several industries since 1991. While working in the maritime industry I have worked for several Fortune 500 firms such as NYK, A.P. Møller-Mærsk Group. I made a career switch, emigrated, learned a new language and moved into the IT industry starting 2000. Since then I have been a Sr. DBA, (Technical) Project Manager, Sr. Consultant, Infrastructure Specialist (Clustering, Load Balancing, Networks, Databases) and (currently) Virtualization/Cloud Computing Expert and Global Sourcing in the IT industry. My deep understanding of multi-cultural issues (having worked across the globe) and international exposure has not only helped me successfully relaunch my career in a new industry but also helped me stay successful in what I do. I believe in "worknets" and "collective or swarm intelligence". As a trainer (technical as well as non-technical) I have trained staff both on national and international level. I am very devoted, perspicacious and hard working.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles