Oracle RAC Administration - Part 11:The RAC DBA continued
December 22, 2006
In this article, we will continue our discussion of Oracle RAC DBA essentials. We have already looked at Enterprise Linux and will continue to build our Oracle RAC Appliance on the Oracle Enterprise Linux layer.
There is nothing more gratifying than an email Saying the series has helped someone solve a problem. I'm really excited to see that people all around the world are enjoying and benefiting from our RAC series, be it installation or RAC administration.
RAC DBA refresher
In our last article, we looked at a few important parameters. We will continue to do the same in this article and have a brief look at oifcfg, as promised. So without further ado, let's get started again.
RAC specific parameters continued:
Let's check out a few more parameters that are present in our RAC environment (there are, however, many more that we will get in touch with in future articles):
The MAX_COMMIT_PROPAGATION_DELAY parameter is deprecated in Release 10g R2, but for those who are on 10g R1, it is still crucial to know what it means. The MAX_COMMIT_PROPAGATION_DELAY parameter actually defines the maximum amount of time that the SCN (System Change Number) is captured in the local instances SGA (System Global Area) before being overwritten by the Log writer process (LGWR). This value must be same across all instances.Allowable, a delay of 700 milliseconds (7 seconds thus) is generally fine. However, in certain high intensive OLTP environments this value can be set to a much lower value, such as 0. A typical example is of a high intensive OLTP environment where many inserts, updates, deletes are taking place, which means a lot of DMLs, all on one node. This is then propagated to all of the other cluster nodes immediately, ensuring that the SCN on all the nodes are same. A typical alert log entry like this is evident of the fact the the propagation was almost instantaneous.
This instance was first to open Picked broadcast on commit scheme to generate SCNs
Any other value greater than 0, say 1 or 2 will reflect a typical entry in the alert log file as such. A Lamport Scheme is used to generate the SCNs:
This instance was first to open Picked Lamport scheme to generate SCNs
Obviously, the downside of it all is that it may have an impact on your performance; this might not seem like a nice idea, especially with your VMware ESX ready RAC. I already get enough emails from users asking advice on countering all kinds of errors on their RACs to keep it up all the time. The LGWR, in this case, has to refresh the values of the SCNs across the RAC cluster nodes frequently, to keep them all in sync. If you are a consultant, or even a DBA, with limited resources in your test environment or at home, I'd advise you to keep this parameters value on the higher side. It goes without saying; if youre using the 10G R2 version then you dont have to worry. Remember that the parameters values must be same across all nodes.
Getting a list of all of the background processes is also handy if you are a DBA and as a RAC DBA, you will come across some additional processes, which are available in your RAC environment.
SQL> select name, description from v$bgprocess where PADDR <> '00'; PMON process cleanup DIAG diagnosibility process LMON global enqueue service monitor LMD0 global enqueue service daemon 0 LMS0 global cache service process 0 LMS1 global cache service process 1 MMAN Memory Manager DBW0 db writer process 0 LGWR Redo etc. LCK0 Lock Process 0 CKPT checkpoint SMON System Monitor Process RECO distributed recovery . . . . .
The additional RAC centric processes are DIAG, LCK, LMON, LMDn, and LMSn processes. We will give a brief description of each and discuss how they interact in a RAC environment next.
DIAG: This is a diagnostic daemon. It constantly monitors the health of the instances across the RAC and possible failures on the RAC. There is one per instance.
LCK: This lock process manages requests that are not cache-fusion requests. Requests like row cache requests and library cache requests. Only a single LCK process is allowed for each instance.
LMD: The Lock Manager Daemon. This is also sometimes referred to as the GES (Global Enqueue Service) daemon since its job is to manage the global enqueue and global resource access. It also detects deadlocks and monitors lock conversion timeouts.
LMON: The Lock Monitor Process. It is the GES monitor. It reconfigures the lock resources adding or removing nodes. LMON will generate a trace file every time a node reconfiguration takes place. It also monitors the RAC cluster wide and detects a nodes demise and trigger a quick reconfiguration.
LMS: This is the Lock Manager Server Process or the LMS process, sometimes also called the GCS (Global Cache Services) process. Its primary job is to transport blocks across the nodes for cache-fusion requests. If there is a consistent-read request, the LMS process rolls back the block, makes a Consistent-Read image of the block and then ship this block across the HSI (High Speed Interconnect) to the process requesting from a remote node. LMS must also check constantly with the LMD background process (or our GES process) to get the lock requests placed by the LMD process. Up to 10 such processes can be generated dynamically.
In the next article, we will continue our RAC DBA essentials. We were supposed to talk about the conversion of your physical RAC to the development and test RAC environments on top of your ESX server in this article, but will take it up next time. It may not be easy (since the physical shared disks and NICs will be sensitive to the VMware environment but it sure is worth a try.