Brief intro
In this article, we will continue our discussion of Oracle
RAC DBA essentials. We have already looked at Enterprise Linux and will
continue to build our “Oracle RAC Appliance” on the Oracle Enterprise Linux
layer.
There is nothing more gratifying than an email Saying the
series has helped someone solve a problem. I’m really excited to see that
people all around the world are enjoying and benefiting from our RAC series, be
it installation or RAC administration.
RAC DBA refresher
In our last article, we looked at a few important
parameters. We will continue to do the same in this article and have a brief
look at oifcfg, as promised. So without further ado, let’s get started again.
RAC specific parameters continued:
Let’s check out a few more parameters that are present in
our RAC environment (there are, however, many more that we will get in touch
with in future articles):
MAX_COMMIT_PROPAGATION_DELAY
The MAX_COMMIT_PROPAGATION_DELAY parameter is deprecated in
Release 10g R2, but for those who are on 10g R1, it is still crucial to know what
it means. The MAX_COMMIT_PROPAGATION_DELAY parameter actually defines the
maximum amount of time that the SCN (System Change Number) is captured in the
local instance’s SGA (System Global Area) before being overwritten by the Log
writer process (LGWR). This value must be same across all instances.Allowable,
a delay of 700 milliseconds (7 seconds thus) is generally fine. However, in
certain high intensive OLTP environments this value can be set to a much lower
value, such as “0”. A typical example is of a high intensive OLTP environment
where many inserts, updates, deletes are taking place, which means a lot of DMLs,
all on one node. This is then propagated to all of the other cluster nodes immediately,
ensuring that the SCN on all the nodes are same. A typical alert log entry like
this is evident of the fact the the propagation was almost instantaneous.
This instance was first to open Picked broadcast on commit scheme to generate SCNs
Any other value greater than 0, say 1 or 2 will reflect a typical
entry in the alert log file as such. A Lamport Scheme is used to generate the SCNs:
This instance was first to open Picked Lamport scheme to generate SCNs
Obviously, the downside of it all is that it may have an
impact on your performance; this might not seem like a nice idea, especially
with your VMware ESX ready RAC. I already get enough emails from users asking
advice on countering all kinds of errors on their RACs to keep it up all the
time. The LGWR, in this case, has to refresh the values of the SCNs across the
RAC cluster nodes frequently, to keep them all in sync. If you are a consultant,
or even a DBA, with limited resources in your test environment or at home, I’d advise
you to keep this parameters value on the higher side. It goes without saying; if
you’re using the 10G R2 version then you don’t have to worry. Remember that the
parameters values must be same across all nodes.
BACKGROUNG PROCESSES
Getting a list of all of the background processes is also
handy if you are a DBA and as a RAC DBA, you will come across some additional processes,
which are available in your RAC environment.
SQL> select name, description from v$bgprocess where PADDR <> '00'; PMON process cleanup DIAG diagnosibility process LMON global enqueue service monitor LMD0 global enqueue service daemon 0 LMS0 global cache service process 0 LMS1 global cache service process 1 MMAN Memory Manager DBW0 db writer process 0 LGWR Redo etc. LCK0 Lock Process 0 CKPT checkpoint SMON System Monitor Process RECO distributed recovery . . . . .
The additional RAC centric processes are DIAG, LCK,
LMON, LMDn, and LMSn processes. We will give a brief description of each and
discuss how they interact in a RAC environment next.
DIAG: This is a diagnostic daemon. It constantly monitors
the health of the instances across the RAC and possible failures on the RAC.
There is one per instance.
LCK: This lock process manages requests that are not
cache-fusion requests. Requests like row cache requests and library cache
requests. Only a single LCK process is allowed for each instance.
LMD: The Lock Manager
Daemon. This is also sometimes referred to as the GES (Global Enqueue Service)
daemon since its job is to manage the global enqueue and global resource
access. It also detects deadlocks and monitors lock conversion timeouts.
LMON: The Lock Monitor
Process. It is the GES monitor. It reconfigures the lock resources adding or
removing nodes. LMON will generate a trace file every time a node
reconfiguration takes place. It also monitors the RAC cluster wide and detects
a node’s demise and trigger a quick reconfiguration.
LMS: This is the Lock
Manager Server Process or the LMS process, sometimes also called the GCS
(Global Cache Services) process. Its primary job is to transport blocks across
the nodes for cache-fusion requests. If there is a consistent-read request, the
LMS process rolls back the block, makes a Consistent-Read image of the block
and then ship this block across the HSI (High Speed Interconnect) to the
process requesting from a remote node. LMS must also check constantly with the
LMD background process (or our GES process) to get the lock requests placed by
the LMD process. Up to 10 such processes can be generated dynamically.
Conclusion:
In the next article,
we will continue our RAC DBA essentials. We were supposed
to talk about the conversion of your physical RAC to the development and test
RAC environments on top of your ESX server in this article, but will take it up
next time. It may not be easy (since the physical shared disks and NICs
will be sensitive to the VMware environment but it sure is worth a try.