Oracle Bloom Filters

In Oracle releases 10.2.0.x and later join processing can be made more efficient by the use of Bloom filters, primarily to reduce traffic between parallel query slaves. What is a Bloom filter? Named after Burton Howard Bloom, who came up with the concept in the 1970s, it’s an efficient data structure used to quickly determine if an element has a high probability of being a member of a given set. It’s based on a bit array that allows for rapid searches and returns one of two results: either the element is probably in the set (which can produce false positives) or the element is definitely not in the set. The filter cannot produce false negatives, and the incidence of false positives is relatively rare. Another advantage to Bloom filters is their small size relative to other data structures used for similar purposes (self-balancing binary search trees, hash tables, or linked lists). The possibility of false positives necessitates the addition of another filter to eliminate them from the results, yet such a filter doesn’t add appreciably to the process time and, therefore, goes relatively unnoticed.

When Bloom filters are used for a query on an Exadata system, the filter predicate and the storage predicate will list the SYS_OP_Bloom_FILTER function as being called. This function includes the additional filter to eliminate any false positives that could be returned. It’s the storage predicate that provides the real power of the Bloom filter on Exadata. Using a Bloom filter to pre-join the tables at the storage server level reduces the volume of data the database servers need to process and can significantly reduce the execution time of the given query.

An example of Bloom filters in action follows; a script named Bloom_fltr_ex.sql was executed to generate this output on an Exadata system; the script can be emailed on request:


SQL> --
SQL> -- Create sample tables
SQL> --
SQL> -- Create them parallel, necessary 
SQL> -- to get a Smart Scan on these tables
SQL> --
SQL> create table emp(
  2  	empid   number,
  3  	empnmvarchar2(40),
  4  	empsal  number,
  5  	empssn  varchar2(12),
  6  	constraint emp_pk primary key (empid)
  7  ) parallel 4;

Table created.

SQL>
SQL> create table emp_dept(
  2  	empid   number,
  3  	empdept number,
  4  	emploc  varchar2(60),
  5    constraint emp_dept_pk primary key(empid)
  6  ) parallel 4;

Table created.

SQL>
SQL> create table dept_info(
  2  	deptnum number,
  3  	deptnm  varchar2(25),
  4    constraint dept_info_pk primary key(deptnum)
  5  ) parallel 4;

Table created.

SQL>
SQL> --
SQL> -- Load sample tables with data
SQL> --
SQL> begin
  2  	     for i in 1..2000000 loop
  3  		     insert into emp
  4  		     values(i, 'Fnarm'||i, (mod(i, 7)+1)*1000, mod(i,10)||mod(i,10)||mod(i,10)||'-'||mod(i,10)||mod(i,10)||'-'||mod(i,10)||mod(i,10)||mod(i,10)||mod(i,10));
  5  		     insert into emp_dept
  6  		values(i, (mod(i,8)+1)*10, 'Zanzwalla'||(mod(i,8)+1)*10);
  7  		     commit;
  8  	     end loop;
  9  	     insert into dept_info
 10  	     select distinct empdept, case when empdept = 10 then 'SALES'
 11  					   when empdept = 20 then 'PROCUREMENT'
 12  					   when empdept = 30 then 'HR'
 13  					   when empdept = 40 then 'RESEARCH'
 14  					   when empdept = 50 then 'DEVELOPMENT'
 15  					   when empdept = 60 then 'EMPLOYEE RELATIONS'
 16  					   when empdept = 70 then 'FACILITIES'
 17  					   when empdept = 80 then 'FINANCE' end
 18  	     from emp_dept;
 19  
 20  end;
 21  /

PL/SQL procedure successfully completed.

SQL>
SQL> --
SQL> -- Run join query using Bloom filter
SQL> --
SQL> -- Generate execution plan to prove Bloom
SQL> -- filter usage
SQL> --
SQL> -- Also report query execution time
SQL> --
SQL> set autotrace on
SQL> set timing on
SQL>
SQL> select /*+ Bloom join 2 parallel 2 use_hash(empemp_dept) */ e.empid, e.empnm, d.deptnm, e.empsal, count(*) emps
  2  fromemp e join emp_depted on (ed.empid = e.empid) join dept_info d on (ed.empdept = d.deptnum)
  3  whereed.empdept = 20
  4  group by e.empid, e.empnm, d.deptnm, e.empsal;

     EMPID EMPNM                  DEPTNM                        EMPSAL       EMPS
---------- ---------------------- ------------------------- ---------- ----------
    904505 Fnarm904505            PROCUREMENT                     1000          1
    907769 Fnarm907769            PROCUREMENT                     3000          1
    909241 Fnarm909241            PROCUREMENT                     5000          1
    909505 Fnarm909505            PROCUREMENT                     3000          1
    909641 Fnarm909641            PROCUREMENT                     6000          1
    910145 Fnarm910145            PROCUREMENT                     6000          1
...
    155833 Fnarm155833            PROCUREMENT                     7000          1
    155905 Fnarm155905            PROCUREMENT                     2000          1
    151081 Fnarm151081            PROCUREMENT                     1000          1
    151145 Fnarm151145            PROCUREMENT                     2000          1

250000 rows selected.

Elapsed: 00:00:14.27

Execution Plan
----------------------------------------------------------
Plan hash value: 2313925751

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows | Bytes | Cost (%CPU)| Time     |   TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |  218K|    21M|  1378   (1)| 00:00:01 |       |      |            |
|   1 |  PX COORDINATOR                          |              |      |       |            |          |       |      |            |
|   2 |   PX SEND QC (RANDOM)                    | :TQ10003     |  218K|    21M|  1378   (1)| 00:00:01 | Q1,03 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                         |              |  218K|    21M|  1378   (1)| 00:00:01 | Q1,03 | PCWP |            |
|   4 |     PX RECEIVE                           |              |  218K|    21M|  1378   (1)| 00:00:01 | Q1,03 | PCWP |            |
|   5 |      PX SEND HASH                        | :TQ10002     |  218K|    21M|  1378   (1)| 00:00:01 | Q1,02 | P->P | HASH       |
|   6 |       HASH GROUP BY                      |              |  218K|    21M|  1378   (1)| 00:00:01 | Q1,02 | PCWP |            |
|*  7 |        HASH JOIN                         |              |  218K|    21M|  1376   (1)| 00:00:01 | Q1,02 | PCWP |            |
|   8 |         PX RECEIVE                       |              |  218K|    11M|   535   (1)| 00:00:01 | Q1,02 | PCWP |            |
|   9 |          PX SEND BROADCAST               | :TQ10001     |  218K|    11M|   535   (1)| 00:00:01 | Q1,01 | P->P | BROADCAST  |
|  10 |           NESTED LOOPS                   |              |  218K|    11M|   535   (1)| 00:00:01 | Q1,01 | PCWP |            |
|  11 |            BUFFER SORT                   |              |      |       |            |          | Q1,01 | PCWC |            |
|  12 |             PX RECEIVE                   |              |      |       |            |          | Q1,01 | PCWP |            |
|  13 |              PX SEND BROADCAST           | :TQ10000     |      |       |            |          |       | S->P | BROADCAST  |
|  14 |               TABLE ACCESS BY INDEX ROWID| DEPT_INFO    |    1 |    27 |     1   (0)| 00:00:01 |       |      |            |
|* 15 |                INDEX UNIQUE SCAN         | DEPT_INFO_PK |    1 |       |     1   (0)| 00:00:01 |       |      |            |
|  16 |            PX BLOCK ITERATOR             |              |  218K|  5556K|   534   (1)| 00:00:01 | Q1,01 | PCWC |            |
|* 17 |             TABLE ACCESS STORAGE FULL    | EMP_DEPT     |  218K|  5556K|   534   (1)| 00:00:01 | Q1,01 | PCWP |            |
|  18 |         PX BLOCK ITERATOR                |              | 1657K|    75M|   839   (1)| 00:00:01 | Q1,02 | PCWC |            |
|* 19 |          TABLE ACCESS STORAGE FULL       | EMP          | 1657K|    75M|   839   (1)| 00:00:01 | Q1,02 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("ED"."EMPID"="E"."EMPID")
  15 - access("D"."DEPTNUM"=20)
  17 - storage("ED"."EMPDEPT"=20)
       filter("ED"."EMPDEPT"=20)
  19 - storage(SYS_OP_Bloom_FILTER(:BF0000,"E"."EMPID"))
       filter(SYS_OP_Bloom_FILTER(:BF0000,"E"."EMPID"))

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
     60  recursive calls
    174  db block gets
  40753  consistent gets
  17710  physical reads
   2128  redo size
9437983  bytes sent via SQL*Net to client
 183850  bytes received via SQL*Net from client
  16668  SQL*Net roundtrips to/from client
      6  sorts (memory)
      0  sorts (disk)
 250000  rows processed

SQL>

In less than 15 seconds 250,000 rows were returned from a three table join of over 4 million rows; the Bloom filter made a dramatic difference in how this query was processed and provided exceptional performance given the volume of data queried.

Without the storage level execution of the Bloom filter, the query execution time increased by 2.33 seconds, a 16.3 percent increase. For longer execution times, this difference can be significant. It isn’t the Bloom filter that gives Exadata such power with joins, it’s the fact that Exadata can execute it not only at the database level but also at the storage level, something commodity hardware configurations can’t do.

Of course Bloom filters are not exclusive to Exadata, as the following run of the same script on a non-Exadata system shows (only the abbreviated output section is reproduced here):


SQL> select /*+ Bloom join 2 parallel 2 use_hash(emp emp_dept) */ e.empid, e.empnm, d.deptnm, e.empsal	  --, count(*) emps
  2  from emp e join emp_dept ed on (ed.empid = e.empid) join dept_info d on (ed.empdept = d.deptnum)
  3  where ed.empdept = 20;

     EMPID EMPNM                                    DEPTNM                        EMPSAL                                                              
---------- ---------------------------------------- ------------------------- ----------                                                              
   1670281 Fnarm1670281                             PROCUREMENT                     5000                                                              
   1670289 Fnarm1670289                             PROCUREMENT                     6000                                                              
   1670297 Fnarm1670297                             PROCUREMENT                     7000                                                              
   1670305 Fnarm1670305                             PROCUREMENT                     1000                                                              
   1670313 Fnarm1670313                             PROCUREMENT                     2000                                                              
   1670321 Fnarm1670321                             PROCUREMENT                     3000                                                              
   1670329 Fnarm1670329                             PROCUREMENT                     4000                                                              
   1670337 Fnarm1670337                             PROCUREMENT                     5000                                                              
   1670345 Fnarm1670345                             PROCUREMENT                     6000                                                              
   1670353 Fnarm1670353                             PROCUREMENT                     7000                                                              
   1670361 Fnarm1670361                             PROCUREMENT                     1000                                                              
...
   1857369 Fnarm1857369                             PROCUREMENT                     4000                                                              
   1857377 Fnarm1857377                             PROCUREMENT                     5000                                                              
   1857385 Fnarm1857385                             PROCUREMENT                     6000                                                              

250000 rows selected.

Elapsed: 00:00:54.86

Execution Plan
----------------------------------------------------------                                                                                            
Plan hash value: 2643012915                                                                                                                           
                                                                                                                                                      
----------------------------------------------------------------------------------------------------------------------------------                    
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |                    
----------------------------------------------------------------------------------------------------------------------------------                    
|   0 | SELECT STATEMENT                     |              |   273K|    26M|  1378   (1)| 00:00:01 |        |      |            |                    
|   1 |  PX COORDINATOR                      |              |       |       |            |          |        |      |            |                    
|   2 |   PX SEND QC (RANDOM)                | :TQ10002     |   273K|    26M|  1378   (1)| 00:00:01 |  Q1,02 | P->S | QC (RAND)  |                    
|*  3 |    HASH JOIN                         |              |   273K|    26M|  1378   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
|   4 |     PX RECEIVE                       |              |   273K|    13M|   536   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
|   5 |      PX SEND BROADCAST               | :TQ10001     |   273K|    13M|   536   (1)| 00:00:01 |  Q1,01 | P->P | BROADCAST  |                    
|   6 |       NESTED LOOPS                   |              |   273K|    13M|   536   (1)| 00:00:01 |  Q1,01 | PCWP |            |                    
|   7 |        BUFFER SORT                   |              |       |       |            |          |  Q1,01 | PCWC |            |                    
|   8 |         PX RECEIVE                   |              |       |       |            |          |  Q1,01 | PCWP |            |                    
|   9 |          PX SEND BROADCAST           | :TQ10000     |       |       |            |          |        | S->P | BROADCAST  |                    
|  10 |           TABLE ACCESS BY INDEX ROWID| DEPT_INFO    |     1 |    27 |     1   (0)| 00:00:01 |        |      |            |                    
|* 11 |            INDEX UNIQUE SCAN         | DEPT_INFO_PK |     1 |       |     1   (0)| 00:00:01 |        |      |            |                    
|  12 |        PX BLOCK ITERATOR             |              |   273K|  6947K|   535   (1)| 00:00:01 |  Q1,01 | PCWC |            |                    
|* 13 |         TABLE ACCESS FULL            | EMP_DEPT     |   273K|  6947K|   535   (1)| 00:00:01 |  Q1,01 | PCWP |            |                    
|  14 |     PX BLOCK ITERATOR                |              |  2099K|    96M|   840   (1)| 00:00:01 |  Q1,02 | PCWC |            |                    
|* 15 |      TABLE ACCESS FULL               | EMP          |  2099K|    96M|   840   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
----------------------------------------------------------------------------------------------------------------------------------                    
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   3 - access("ED"."EMPID"="E"."EMPID")                                                                                                               
  11 - access("D"."DEPTNUM"=20)                                                                                                                       
  13 - filter("ED"."EMPDEPT"=20)                                                                                                                      
  15 - filter(SYS_OP_Bloom_FILTER(:BF0000,"E"."EMPID"))                                                                                               
                                                                                                                                                      
Note                                                                                                                                                  
-----                                                                                                                                                 
   - dynamic sampling used for this statement (level=2)                                                                                               


Statistics
----------------------------------------------------------                                                                                            
         33  recursive calls                                                                                                                          
        139  db block gets                                                                                                                            
      36224  consistent gets                                                                                                                          
      17657  physical reads                                                                                                                           
          0  redo size                                                                                                                                
    9526012  bytes sent via SQL*Net to client                                                                                                         
     183846  bytes received via SQL*Net from client                                                                                                   
      16668  SQL*Net roundtrips to/from client                                                                                                        
          5  sorts (memory)                                                                                                                           
          0  sorts (disk)                                                                                                                             
     250000  rows processed                                                                                                                           

SQL>

The only differences between the two plans involve the storage component of Exadata. Notice also the vast difference in elapsed times between Exadata and non-Exadata systems; Exadata ran the query returning 250,000 rows in just over 14 seconds and the non-Exadata system returned the same volume of data in just under 55 seconds. Both times are substantial improvements over not using a Bloom filter so any hardware running 11.2.0.2 or later will benefit from Bloom filters.

Of course there is a ‘gotcha’ with Bloom filters, and Oracle recognizes this as a bug in releases older than 12.1.0.2. Hash left outer joins, hash right outer joins and hash full outer joins do not use a Bloom Filter and for those interested the unpublished bug number is 17716301. MOS document 1919508.1 reports details on this behavior including an example. As reported in that document the patch can be backported to earlier releases when an SR is filed requesting it.

Bloom filters are a welcome addition to both Exadata and non-Exadata installations as they can improve query performance with parallel queries, as evidenced here. Now that you know how to spot when they are used you may find they are utilized more often than you expected.

See all articles by David Fitzjarrell

David Fitzjarrell
David Fitzjarrell
David Fitzjarrell has more than 20 years of administration experience with various releases of the Oracle DBMS. He has installed the Oracle software on many platforms, including UNIX, Windows and Linux, and monitored and tuned performance in those environments. He is knowledgeable in the traditional tools for performance tuning – the Oracle Wait Interface, Statspack, event 10046 and 10053 traces, tkprof, explain plan and autotrace – and has used these to great advantage at the U.S. Postal Service, American Airlines/SABRE, ConocoPhilips and SiriusXM Radio, among others, to increase throughput and improve the quality of the production system. He has also set up scripts to regularly monitor available space and set thresholds to notify DBAs of impending space shortages before they affect the production environment. These scripts generate data which can also used to trend database growth over time, aiding in capacity planning. He has used RMAN, Streams, RAC and Data Guard in Oracle installations to ensure full recoverability and failover capabilities as well as high availability, and has configured a 'cascading' set of DR databases using the primary DR databases as the source, managing the archivelog transfers manually and montoring, through scripts, the health of these secondary DR databases. He has also used ASM, ASMM and ASSM to improve performance and manage storage and shared memory.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles