Latches can be confusing. Sessions can wait for them and when those waits become excessive performance degrades. To many they appear to be the same as locks, but they aren’t, even though lock waits, when the times get excessive, can also degrade performance. So, what are the differences between latches and locks? Let’s examine both and see where they differ, what can cause latch waits and how latch waits can be investigated.
Latches are similar to locks but they operate on memory to protect code and internal data structures by preventing concurrent access. For example, the LRU latches are used when managing the buffer cache, an operation that is restricted to being run by a single process; other processes must wait for the current process to complete its actions on the buffer cache before the next one in line can proceed. The latch holds this structure for the current process to access; when the current process is done the latch is released and the next process in the queue can acquire it. Since only one process at a time can hold a latch, ‘popular’ actions can create a long line of processes waiting to obtain the associated latch. Normally latch waits are in milliseconds or fractions of milliseconds so waiting for a latch usually isn’t noticeable.
Let’s look at the differences between a lock and a latch:
Locks -- Protect the logical contents of the database object (table, index) from other transactions. -- Are held for the transaction duration. -- Provide rollback capability for the associated transaction. Latch -- Protect the critical sections of the associated internal data structures from other threads. -- Are held only until the operation completes and then are released. -- Prevent concurrent access to a memory structure.
Since memory operations aren’t transactions the ability to ‘rollback’ a change isn’t necessary; clearing a buffer so another process can use it is a fast, irreversible operation that isn’t designed to be undone. Contrast that to a lock, which controls access to change data in a table; such actions are transactional in nature and are designed with the ability to undo the changes should the need arise. Locks can also protect a data image while it’s being read, which gives rise to the different types of locks that are available — shared, row shared, row shared exclusive, exclusive — that govern who and how the data can be modified. Without exception, every lock in an Oracle database allows other sessions to read data in the locked table; the locks are there to control concurrent updates and protect against the possibility of ‘lost’ transactions.
Latches can indicate the existence of a locking problem, as the ‘cursor pins wait on x’ latch illustrates. This latch, when its wait times become long, usually indicates an issue with a ‘hot block’, a data or index block that is so popular everyone and their Aunt Gertrude are waiting for it. Long waits for this latch usually accompany concurrent sessions inserting data into a table using a sequence; the sequence values will be clustered together and will usually access the same data (for direct-path inserts) or index block and can cause sessions to wait until the current process completes. And since the latch allows one, and only one, process access to it at a time the rest of the interested transactions (and their processes) form a long queue, sort of like a really busy day at the amusement park. In such cases addressing the locking problem also addresses the latching problem.
Various tools are available to investigate latching issues; Tanel Poder has written two scripts, latchprof.sql and latchprofx.sql, that provide a wealth of information about latch wait history. They can be invaluable in investigating latch waits and contention. An example of the output that can be generated is shown below:
-- LatchProfX 2.00 by Tanel Poder ( http://www.tanelpoder.com )
SID NAME HMODE FUNC OBJECT SQLID Held Gets Held % Held ms Avg hold ms
----- ----------------------------------- ------------ ---------------------------------------- -------------- ------------- ---------- ---------- ------- ----------- -----------
144 transaction allocation exclusive ktcxbr 0 9brz3jsxmf4ts 788 57 78.80 267.920 4.700
144 DML lock allocation exclusive ktadmc 0 9brz3jsxmf4ts 609 59 60.90 207.060 3.509
1168 shared pool exclusive kghalo 0 0zmr8ccx3xpck 177 5 17.70 60.180 12.036
1562 cache buffers lru chain exclusive kcbzgws 0 8n2dsqpu94h4a 119 7 11.90 40.460 5.780
1562 cache buffers chains exclusive kcbgtcr: slow path excl 2403D825 8n2dsqpu94h4a 90 1 9.00 30.600 30.600
1562 cache buffers lru chain exclusive kcbzgws 0 gt8xcw52s32ha 40 16 4.00 13.600 .850
2794 redo copy exclusive kcrfw_redo_gen: nowait 0 4vs91dcv7u1p6 38 2 3.80 12.920 6.460
2328 KTILM latch shared ktilmht get 0 bnqcqzztx3a41 37 4 3.70 12.580 3.145
2794 sim partition latch exclusive kcbm_simulate 0 4vs91dcv7u1p6 37 1 3.70 12.580 12.580
1562 KTILM latch shared ktilmht get 0 8n2dsqpu94h4a 30 6 3.00 10.200 1.700
2328 cache buffers chains exclusive kcbgcur: fast path excl 87316B 9zg9qd9bm4spu 26 1 2.60 8.840 8.840
2412 row cache objects exclusive kqrso 0 24 15 2.40 8.160 .544
1413 cache buffers chains exclusive kcbgcur: fast path excl 873294 9zg9qd9bm4spu 22 1 2.20 7.480 7.480
2412 row cache objects exclusive kqrpre: find obj 0 21 11 2.10 7.140 .649
1168 row cache objects exclusive kqrpre: find obj 0 0zmr8ccx3xpck 20 10 2.00 6.800 .680
2328 row cache objects exclusive kqrso 0 20 13 2.00 6.800 .523
1168 cache buffers chains exclusive kcbgcur: fast path excl 873294 9zg9qd9bm4spu 19 1 1.90 6.460 6.460
2328 row cache objects exclusive kqrpre: find obj 0 5wwkp7spyq2fn 19 11 1.90 6.460 .587
2412 cache buffers chains exclusive kcbgcur: fast path excl 87260B 9zg9qd9bm4spu 18 1 1.80 6.120 6.120
1562 KTILM latch shared ktilmht get 0 gt8xcw52s32ha 18 5 1.80 6.120 1.224
1413 cache buffers lru chain exclusive kcbzgb 0 9zg9qd9bm4spu 18 1 1.80 6.120 6.120
2794 row cache objects exclusive kqrso 0 17 13 1.70 5.780 .445
1168 cache buffers lru chain exclusive kcbzgb 0 9zg9qd9bm4spu 17 1 1.70 5.780 5.780
2328 row cache objects exclusive kqrpre: find obj 0 17 10 1.70 5.780 .578
2328 shared pool exclusive kghalo 0 15 7 1.50 5.100 .729
2794 row cache objects exclusive kqrpre: find obj 0 14 12 1.40 4.760 .397
2412 shared pool exclusive kghalo 0 14 4 1.40 4.760 1.190
2794 row cache objects exclusive kqrpre: find obj 0 4k3y3hrxfzh6v 14 8 1.40 4.760 .595
2794 shared pool exclusive kghalo 0 4k3y3hrxfzh6v 13 6 1.30 4.420 .737
1562 cache buffers lru chain exclusive kcbzgws 0 149htpw5s5n0c 13 5 1.30 4.420 .884
2412 row cache objects exclusive kqreqd 0 13 9 1.30 4.420 .491
2794 row cache objects exclusive kqrso 0 4k3y3hrxfzh6v 11 9 1.10 3.740 .416
1562 KTILM latch shared ktilmht get 0 by143mnp65wgp 10 2 1.00 3.400 1.700
1562 cache buffers chains exclusive kcbzgb: scan from tail. nowait 0 gt8xcw52s32ha 10 8 1.00 3.400 .425
2794 cache buffers chains exclusive kcbgtcr: slow path excl 400276 4k3y3hrxfzh6v 9 1 .90 3.060 3.060
2794 cache buffers lru chain exclusive kcbzgws 0 4k3y3hrxfzh6v 9 1 .90 3.060 3.060
2412 row cache objects exclusive kqrpre: find obj 0 5wwkp7spyq2fn 9 7 .90 3.060 .437
2412 simulator lru latch exclusive kcbs_simulate: simulate set 34 5wwkp7spyq2fn 9 1 .90 3.060 3.060
2328 row cache objects exclusive kqreqd 0 9 6 .90 3.060 .510
1168 cache buffers chains exclusive kcbgtcr: slow path excl 402106 0zmr8ccx3xpck 9 1 .90 3.060 3.060
1562 object queue header operation exclusive kcbo_link_q 0 gt8xcw52s32ha 9 6 .90 3.060 .510
2328 KTILM latch shared ktilmht get 0 4twyxstjxggc1 9 2 .90 3.060 1.530
1168 cache buffers lru chain exclusive kcbzgws 0 0zmr8ccx3xpck 9 1 .90 3.060 3.060
144 simulator lru latch exclusive kcbs_simulate: simulate set 3C 9brz3jsxmf4ts 9 1 .90 3.060 3.060
1168 KTILM latch shared ktilmht get 0 24kkd5u3fdf0c 9 1 .90 3.060 3.060
2328 simulator lru latch exclusive kcbs_simulate: simulate set 3A 24kkd5u3fdf0c 9 1 .90 3.060 3.060
2412 simulator lru latch exclusive kcbs_simulate: simulate set 39 5wwkp7spyq2fn 8 1 .80 2.720 2.720
2412 cache buffers chains maybe-shared kcbgtcr: fast path exam 4003C1 5wwkp7spyq2fn 8 1 .80 2.720 2.720
2412 cache buffers lru chain exclusive kcbzgb 0 9zg9qd9bm4spu 8 1 .80 2.720 2.720
2328 cache buffers lru chain exclusive kcbzgb 0 9zg9qd9bm4spu 8 1 .80 2.720 2.720
In this example the latchprofx.sql script is being called; it reports the SID of the holding session, the name of the latch, the mode in which the latch is held (notice that almost all of the latches reported are exclusive), the associated kernel function, the object involved (for latches associated with transactional activity), the number of times the latch has been held, the number of times the latch has been gotten successfully, the percentage of gets to holds, the total hold time in milliseconds and the average hold time, again in milliseconds. The report is ordered by the HELD values to make it easier to see which latches are being held and for how long. Other values can be output depending upon the parameters passed to the script.
As latches control access to memory structures anything that affects how memory is allocated and used can affect them. It’s possible to run afoul of latch contention when memory ‘leaks’ occur (usually caused by code that doesn’t properly manage the memory structures it uses). Common memory ‘leaks’ involve memory areas that are no longer used but are no longer accessible because the code that created the memory structure didn’t free it properly. As available memory slowly decreases from these leaks it can become more difficult to obtain a latch, resulting in queues that are much longer than they would normally be. Another problem that can create the same situation is remote user sessions that ‘die’ at the client but can’t clean up properly, leaving orphan processes holding latches that can’t be released. How such situations are managed is dependent upon the specific conditions surrounding them and resolution in extreme cases could require drastic action. Since these are usually rare occurrences these won’t be discussed further.
AWR and ASH reports are a good place to start when investigating wait activity and knowing whether the waits are lock or latch related; of course, those reports require the Diagnostic and Tuning pack license and may not be available if your site isn’t licensed. Statspack reports are also good sources of information to start an investigation and list latch activity in a dedicated report section; a part of that section is reproduced below:
Latch Activity DB/Inst: FNORG/fnorg Snaps: 1-2
->"Get Requests", "Pct Get Miss" and "Avg Slps/Miss" are statistics for
willing-to-wait latch get requests
->"NoWait Requests", "Pct NoWait Miss" are for no-wait latch get requests
->"Pct Misses" for both should be very close to 0.0
Pct Avg Wait Pct
Get Get Slps Time NoWait NoWait
Latch Requests Miss /Miss (s) Requests Miss
------------------------ -------------- ------ ------ ------ ------------ ------
AQ Background: interrupt 1 0.0 0 0
AQ deq hash table latch 1 0.0 0 0
ASM db client latch 5 0.0 0 0
ASM map operation hash t 1 0.0 0 0
Change Notification Hash 3 0.0 0 0
Consistent RBA 40 0.0 0 0
DML lock allocation 706 0.0 0 0
Event Group Locks 1 0.0 0 0
File State Object Pool P 1 0.0 0 0
I/O Staticstics latch 1 0.0 0 0
ILM Stats Stripe Latch 1 0.0 0 0
ILM Stats main anchor la 155 0.0 0 0
ILM access tracking exte 1 0.0 0 0
ILM activity tracking la 1 0.0 0 0
IM area sb latch 1 0.0 0 0
IM area scb latch 1 0.0 0 0
IM emb latch 1 0.0 0 0
IM seg hdr latch 1 0.0 0 0
IPC stats buffer allocat 1 0.0 0 0
In memory undo latch 1 0.0 0 0
JS Sh mem access 1 0.0 0 0
JS queue access latch 1 0.0 0 0
JS queue state obj latch 84 0.0 0 0
JS slv state obj latch 1 0.0 0 0
KCNIBR - invalid block r 1 0.0 0 0
KFC FX Hash Latch 1 0.0 0 0
KFC Hash Latch 1 0.0 0 0
KFCL LE Freelist 1 0.0 0 0
KGNFS-NFS:SHM structure 1 0.0 0 0
KGNFS-NFS:SVR LIST 1 0.0 0 0
KJC message pool free li 1 0.0 0 0
KJC message pool pending 1 0.0 0 0
KJCT flow control latch 1 0.0 0 0
KMG MMAN ready and start 2 0.0 0 0
KSFS OFS ctx level paren 1 0.0 0 0
KSFS OFS req layer paren 1 0.0 0 0
KSFS OFS sess layer pare 1 0.0 0 0
KSFS id table parent lat 1 0.0 0 0
KSFS ksfs_node latch 1 0.0 0 0
KSFSD name cache parent 1 0.0 0 0
KSXR hang analysis phase 1 0.0 0 0
KSXR hang analysis phase 1 0.0 0 0
KTF sga latch 0 0 1 0.0
KTILM latch 1 0.0 0 0
KWQP Prop Status 2 0.0 0 0
Locator state objects po 1 0.0 0 0
Lsod array latch 1 0.0 0 0
Memory Management Latch 1 0.0 0 2 0.0
Memory Queue 1 0.0 0 0
Memory Queue Message Sub 1 0.0 0 0
...
Latch contention occurs with the ‘willing to wait’ latch calls; nowait calls won’t build a queue and will be retried at a later time.
Knowing the difference between a latch and a lock, and that latches can be affected by locks (but in most cases, they are not) can make troubleshooting application and database performance a bit easier. Since latches are memory control structures they can be affected by the amount of available memory as well as by the overall activity of the database. Keeping this in mind during such investigations can prevent the DBA from being led into trying to address a lock problem when it’s really a latch affecting performance. Also, having the right tools available can greatly assist in determining where the problem area or areas lie.
Yes, latches can be confusing. Hopefully this has provided guidance and insight into what latches are, how they can affect performance and what tools to use to find additional information.