What is the Latch Event and Why Does It Occur?

Posted in an Oracle forum some time back was a question on the `latch` event:

```
latch: parallel query alloc buffer
```

It was noted by the person posting the question that there’s not much, if any, information on the `latch` event. The question posed in that forum was, essentially, just WHAT is that latch event and why does it occur? To answer this it will be necessary to understand some of the ‘ins-and-outs’ of parallel processing according to Oracle.

Parallel query processing allocates slaves to do the bulk of the work; as a result of that parallel coordinators allocate message buffers to handle intermediate results. Depending on the system configuration and the load the parallel slaves must handle a surprisingly large number of message buffers can be allocated. There are two equations that can be used to compute the number of buffers that will be required; those equations use the parallel degree value that is either calculated by Oracle or specified in a hint; for simplicity that value will be represented by `pd`. For a non-RAC Oracle database the number of buffers required would be calculated as:

```
pd(pd+1)*3
```

RAC adds a bit of complexity and multiple instances and, as a result, the equation is modified to be:

```
pd(pd+1)*4
```

On non-RAC systems various degrees of parallelism will be used to calculate the estimated number of buffers Oracle could allocate. Consider a query executed with a parallel degree set to 4; the first equation produces the following results:

```
4(4+1) * 3
```

becomes:

```
4(5) * 3
```

and simplifying:

```
20 * 3 = 60
```

Four parallel slaves can allocate 60 buffers. Now consider setting a parallel degree of 16. It’s important to know that this buffer calculation is not proportional; a degree of 16 is four times the original setting, but the result is far from four times the original result:

```
16(16+1) * 3
```

Combining like terms produces:

```
16(17) * 3
```

which reduces to:

```
272 * 3 = 816
```

A parallel execution using 16 slaves requires a total of 816 message buffers, almost 14 times the number required by the execution at a parallel degree of 4. Given the computing power in modern servers, it is not a ridiculous thought to set a parallel degree of 128. Performing that calculation produces the following result:

```
126(128+1) * 3
```

Continuing as in previous calculations:

```
128(129) * 3
```

The result is:

:

```
16512 * 3 = 45936
```

This is a considerable increase over the buffers allocated for a parallel degree 4 execution and far larger than what would be expected if the result from the degree 4 execution is multiplied by 32. Having the “`latch: parallel query alloc buffer`” wait as one of the top waits isn’t a far-fetched idea after all. A determining factor affecting this wait is where the PX message pool (‘PX msg pool’) resides; both the large pool and the shared pool are locations where this pool is usually found. The query provided below can be executed to learn where it located in a given database configuration:

```
break on pool skip 1
column name format a32

select
pool,
name,
bytes
from
v\$sgastat
where
pool = 'large pool'
or
name like '%PX%'
or
name like '%para%'
order by
pool, name;
```

An example of the results of this query follows:

```
BING @ quanghoo > break on pool skip 1
BING @ quanghoo > column name format a32
BING @ quanghoo >
BING @ quanghoo > select
2         pool,
3         name,
4         bytes
5 from
6         v\$sgastat
7 where
8         pool = 'large pool'
9    or
10         name like '%PX%'
11    or
12         name like '%para%'
13 order by
14         pool, name;

POOL               NAME            BYTES
------------ -------------------------------- ----------
large pool  PX msg pool         491520
free memory         7897088

shared pool   PX QC deq stats         1696
PX QC msg stats         3080
PX list of chunk lists  1088
PX msg pool struct      56
PX scan rate stats      1080
PX server deq stats     1696
PX server msg stats     3080
PX subheap              156344

POOL         NAME                   BYTES
------------ -------------------------------- ----------
shared pool  PX subheap desc            256
parallel kcbibr               32
parallel kcbibr dbwr bitv     16
parallel_max_servers          14720
param hash values             15912
parameter blocks              31824
parameter handle              137584
parameter string values       291240
parameter table block         1050648
parameter text value          9400
parameter value memory        1416

21 rows selected.

BING @ quanghoo &gt
```

From the example above it can be seen that the ‘PX msg pool’, for this database, is in the large pool. Versions of Oracle older than 11.2 (there are still some running) could be using the shared pool; this could result in the “`latch: parallel query alloc buffer`” wait to experience exceptionally long wait times as the result of contention for the available resources.

For RAC configurations the equation changes slightly but that does not mean that the calculated results follow that trend. Using the same parallel degree settings from the first round of calculations the differences start small but soon increase dramatically:

```
4(4+1) * 4
```

reduces to:

```
4(5) * 4
```

and produces a final result of:

```
20 * 4 = 80
```

an increase of 33% simply because this is Real Application Clusters. Moving on to a parallel degree of 16 the equation generates:

```
16(16+1) * 4
```

resulting in:

```
16(17) * 4
```

giving a final value of:

```
272 * 4 = 1088
```

This is 272 more buffers as a result of a clustered database configuration.

Engineered systems, from Oracle and from other vendors, make it possible to have a parallel degree set to 128 or higher. Considering parallel execution at a degree of 128 the computed allocation becomes:

```
128(128+1) * 4
```

Combining like terms:

```
128(129) * 4
```

produces:

```
16512 * 4 = 66048
```

66,048 is a large number of message buffers. Even 10 years ago a parallel degree of 128 could have crippled a transactional database and possibly some smaller data warehouses or data marts with waits for the ‘parallel query alloc buffer’ latch. Depending on the overall resource utilization (disk, memory, swap) some smaller engineered systems could still fall victim to this wait.

If parallel execution plans are rare in a database configuration this latch wait may be of little or no concern. If there is a reasonable chance that parallel execution can occur, then being aware that parallel buffer allocation can be an issue can make DBA life a bit less surprising and may shorten problem investigation and resolution times. This can all lead to a bit less stress on the DBA.

# # #

See all articles by David Fitzjarrell

David Fitzjarrell
David Fitzjarrell has more than 20 years of administration experience with various releases of the Oracle DBMS. He has installed the Oracle software on many platforms, including UNIX, Windows and Linux, and monitored and tuned performance in those environments. He is knowledgeable in the traditional tools for performance tuning – the Oracle Wait Interface, Statspack, event 10046 and 10053 traces, tkprof, explain plan and autotrace – and has used these to great advantage at the U.S. Postal Service, American Airlines/SABRE, ConocoPhilips and SiriusXM Radio, among others, to increase throughput and improve the quality of the production system. He has also set up scripts to regularly monitor available space and set thresholds to notify DBAs of impending space shortages before they affect the production environment. These scripts generate data which can also used to trend database growth over time, aiding in capacity planning. He has used RMAN, Streams, RAC and Data Guard in Oracle installations to ensure full recoverability and failover capabilities as well as high availability, and has configured a 'cascading' set of DR databases using the primary DR databases as the source, managing the archivelog transfers manually and montoring, through scripts, the health of these secondary DR databases. He has also used ASM, ASMM and ASSM to improve performance and manage storage and shared memory.