Oracle’s Bulk Collect and Limit Can Be A Problem

For several years the FORALL loop, in conjunction with the BULK COLLECT operation, has allowed PL/SQL developers to more efficiently process data by eliminating the row-by-row fetch cycle in favor of a bulk load into memory. Such a change can speed processing up considerably, but it can also throw a ‘monkey wrench’ into the works if the LIMIT doesn’t evenly divide the row count. What can be more frustrating is that the same code works sometimes and fails sometimes, all without modifying a single character. Let’s look at why this can happen and what can be done to fix it.

BULK COLLECT does two things, basically: it fetches ‘chunks’ of data into one or more tables of records and it keeps track of the remaining data, in a rudimentary way, by monitoring for ‘no data found’, signaling the end of the result set. Unfortunately, the fetch that generates that ‘no data found’ condition also sets %NOTFOUND for the cursor and when the row source volume doesn’t evenly divide by the LIMIT selected that last fetch is a partial fetch. It does populate the record variable or variables, so the data is there at the time it is fetched. The issue is when a FORALL loop, with a LIMIT set, exits with a cur%NOTFOUND the partial fetch of data is lost as the cursor is closed. Let’s diagram (in a way) that behavior:

Fetch 1, limit of 12, result set count of 37, loop still runs:

'A'
'B'
'C'
'D'
'E'
'F'
'G'
'H'
'I'
'J'
'K'
'L'

25 rows remain.  Fetch 2, same conditions, loop still runs:

'M'
'N'
'O'
'P'
'Q'
'R'
'S'
'T'
'U'
'V'
'W'
'X'

13 rows remain.  Fetch 3, same conditions, loop still runs:

'Y'
'Z'
'AA'
'AB'
'AC'
'AD'
'AE'
'AF'
'AG'
'AH'
'AI'
'AJ'

1 row remains.  Fetch 4, same conditions, loop looks for %NOTFOUND and terminates before last row can be processed:

'AK'
[NODATAFOUND]

If that’s still confusing the example below should clear things up. Let’s take the ubiquitous EMP table, with 14 rows of data, and try to copy that to a table named EMP_TEST, an exact copy of EMP in definition but completely empty:


SQL> 
SQL> --
SQL> -- Create empty table just like EMP
SQL> --
SQL> -- We'll populate this with a FORALL loop
SQL> -- using BULK COLLECT and LIMIT
SQL> --
SQL> -- Well, we'll try to but the first attempt will fail
SQL> -- to completely populate the table
SQL> --
SQL> create table emp_test as select * From emp where 0=1;

Table created.

SQL> 

Let’s now set up an example that will definitely fail to populate EMP_TEST; we’ll exit the loop with the %NOTFOUND condition and a LIMIT that will leave records behind:


SQL> 
SQL> --
SQL> -- Populate the table
SQL> --
SQL> -- Exit the loop in the usual way
SQL> --
SQL> -- Given the LIMIT value this won't get
SQL> -- all of the records inserted
SQL> --
SQL> declare
  2  	     type empcoltyp is table of emp%rowtype;
  3  	     emp_c empcoltyp;
  4  
  5  	     cursor get_emp_data is
  6  	     select * from emp;
  7  
  8  begin
  9  	     open get_emp_data;
 10  	     loop
 11  	     fetch get_emp_data bulk collect into emp_c limit 9;
 12  	     exit when get_emp_data%notfound;
 13  
 14  	     for i in 1..emp_c.count loop
 15  		     insert into emp_test (empno, ename, sal)
 16  		     values (emp_c(i).empno, emp_c(i).ename, emp_c(i).sal);
 17  	     end loop;
 18  
 19  	     end loop;
 20  
 21  	     commit;
 22  
 23  end;
 24  /

PL/SQL procedure successfully completed.

SQL> 
SQL> --
SQL> -- Verify that the data was inserted
SQL> --
SQL> -- The count from emp_test should match the count
SQL> -- from emp
SQL> --
SQL> -- It doesn't
SQL> --
SQL> select count(*) from emp;

  COUNT(*)
----------
        14

SQL> select count(*) from emp_test;

  COUNT(*)
----------
         9

SQL> 
SQL> --
SQL> -- Let's see what's missing
SQL> --
SQL> select * From emp where empno not in (select empno from emp_test);

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0         30
      7934 MILLER     CLERK           7782 23-JAN-82       1300                    10
      7902 FORD       ANALYST         7566 03-DEC-81       3000                    20
      7876 ADAMS      CLERK           7788 12-JAN-83       1100                    20
      7900 JAMES      CLERK           7698 03-DEC-81        950                    30

SQL> 
SQL> --
SQL> -- Seems that we are 5 rows short
SQL> --
SQL> -- Unfortunately NOTFOUND is set on the incomplete fetch
SQL> -- from emp (the fetch that returned less than LIMIT)
SQL> -- so the loop exited before processing the remaining data
SQL> --
SQL> -- Ooooops
SQL> --
SQL> 

Had there been 18 rows, or had the LIMIT been 7, the insert would have completed, but it would have been a false sense of security as the next FORALL/BULK COLLECT/LIMIT process could fail because the row source wasn’t an even multiple of the LIMIT. To fix this properly we need to use a property of the record variable itself, count and check for a value of 0, indicating that all of the fetched records have been processed. It requires one more pass through the loop but that doesn’t add any appreciable time to the execution. Re-writing the process to check the record count, rather than for %NOTFOUND, populates the EMP_TEST table completely:


SQL> 
SQL> --
SQL> -- Let's try this again
SQL> --
SQL> truncate table emp_test;

Table truncated.

SQL> 
SQL> --
SQL> -- Change the exit strategy to look for a 0 count
SQL> -- for the populated variable
SQL> --
SQL> -- This will get all of the rows from EMP into EMP_TEST
SQL> --
SQL> declare
  2  	     type empcoltyp is table of emp%rowtype;
  3  	     emp_c empcoltyp;
  4  
  5  	     cursor get_emp_data is
  6  	     select * from emp;
  7  
  8  begin
  9  	     open get_emp_data;
 10  	     loop
 11  	     fetch get_emp_data bulk collect into emp_c limit 9;
 12  	     exit when emp_c.count = 0;
 13  
 14  	     for i in 1..emp_c.count loop
 15  		     insert into emp_test (empno, ename, sal)
 16  		     values (emp_c(i).empno, emp_c(i).ename, emp_c(i).sal);
 17  	     end loop;
 18  
 19  	     end loop;
 20  
 21  	     commit;
 22  
 23  end;
 24  /

PL/SQL procedure successfully completed.

SQL> 
SQL> --
SQL> -- Check counts to ensure all data was inserted
SQL> --
SQL> select count(*) from emp;

  COUNT(*)
----------
        14

SQL> select count(*) from emp_test;

  COUNT(*)
----------
        14

SQL> 
SQL> --
SQL> -- Let's see what's missing
SQL> --
SQL> -- This time there won't be any rows returned
SQL> --
SQL> select * From emp where empno not in (select empno from emp_test);

no rows selected

SQL> 

It’s a problem that can be frustrating to work on when knowledge of how a FORALL sets the cursor status with a LIMIT-based fetch is missing. Not all loops behave in the same way; a straight FOR loop fetches records one at a time so that when %NOTFOUND is set it’s actually at the end of the data. A BULK COLLECT/LIMIT fetch sets %NOTFOUND when it can’t find any more data and that could be from a partial fetch (again where the fetch count is less than the set LIMIT value). Exiting with %NOTFOUND will close the cursor and ‘lose’ the remaining data that was fetched, resulting in incompletely processing the result set. And, since this behavior depends on both the LIMIT and the total number of rows in the result set it can work just as often as it fails.

It’s been said that knowledge can be a dangerous thing, but not in the database world since more knowledge means better understanding. Understanding how a BULK COLLECT fetch sets %NOTFOUND when a LIMIT is set can make a huge difference in how such a loop exit is written. And that knowledge can help ensure that records don’t go mysteriously ‘missing’.

See all articles by David Fitzjarrell

David Fitzjarrell
David Fitzjarrell
David Fitzjarrell has more than 20 years of administration experience with various releases of the Oracle DBMS. He has installed the Oracle software on many platforms, including UNIX, Windows and Linux, and monitored and tuned performance in those environments. He is knowledgeable in the traditional tools for performance tuning – the Oracle Wait Interface, Statspack, event 10046 and 10053 traces, tkprof, explain plan and autotrace – and has used these to great advantage at the U.S. Postal Service, American Airlines/SABRE, ConocoPhilips and SiriusXM Radio, among others, to increase throughput and improve the quality of the production system. He has also set up scripts to regularly monitor available space and set thresholds to notify DBAs of impending space shortages before they affect the production environment. These scripts generate data which can also used to trend database growth over time, aiding in capacity planning. He has used RMAN, Streams, RAC and Data Guard in Oracle installations to ensure full recoverability and failover capabilities as well as high availability, and has configured a 'cascading' set of DR databases using the primary DR databases as the source, managing the archivelog transfers manually and montoring, through scripts, the health of these secondary DR databases. He has also used ASM, ASMM and ASSM to improve performance and manage storage and shared memory.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles