SHARE

Cartesian Joins in Oracle: The Nullable Column Problem

Written By

Apr 27, 2015

4 minute read

It can be frustrating at times to examine an execution plan for a given query against one or more large tables only to see this dreaded step:

MERGE JOIN CARTESIAN

Oracle ACE Director Jonathan Lewis posted such a problem on his blog referring to a customer of his experiencing that exact problem. As a solution Jonathan thought that finding the distinct combinations of fields the query was looking for would reduce the data set considerably providing a different and more efficient path to the data. The problem resulted from the matching criteria between the two tables; all eight attributes are mandatory in the large table, the attribute columns in the types table can be null but they both can’t all be null in the same record. Also, since the attribute columns aren’t indexed this will result in a full table scan for both tables. Oracle, in this case, chose a MERGE JOIN CARTESIAN to generate the result set, which needed to be fixed. A slight re-work of the example of the fix he posted is shown below, along with the plan it generated (run on Oracle 11.2.0.4):

SQL> with alldat as (
  2  select
  3  /*+ materialize */
  4  my_id, my_val, my_attr1, my_attr2, my_attr3, my_attr4, my_attr5, my_attr6, my_attr7, my_attr8
  5  from
  6  largetab
  7  ),
  8  unq_data as (
  9  select
 10  /*+ materialize */
 11  distinct my_attr1, my_attr2, my_attr3, my_attr4, my_attr5, my_attr6, my_attr7, my_attr8
 12  from    alldat
 13  )
 14  select
 15  ad.my_id, ad.my_val, tt.my_cat, tt.my_rel_val
 16  from
 17  unq_data        ud,
 18  ttpe_tab	     tt,
 19  alldat	     ad
 20  where
 21  nvl(tt.my_attr1(+), ud.my_attr1) = ud.my_attr1
 22  and     nvl(tt.my_attr2(+), ud.my_attr2) = ud.my_attr2
 23  and     nvl(tt.my_attr3(+), ud.my_attr3) = ud.my_attr3
 24  and     nvl(tt.my_attr4(+), ud.my_attr4) = ud.my_attr4
 25  and     nvl(tt.my_attr5(+), ud.my_attr5) = ud.my_attr5
 26  and     nvl(tt.my_attr6(+), ud.my_attr6) = ud.my_attr6
 27  and     nvl(tt.my_attr7(+), ud.my_attr7) = ud.my_attr7
 28  and     nvl(tt.my_attr8(+), ud.my_attr8) = ud.my_attr8
 29  and     ad.my_attr1 = ud.my_attr1
 30  and     ad.my_attr2 = ud.my_attr2
 31  and     ad.my_attr3 = ud.my_attr3
 32  and     ad.my_attr4 = ud.my_attr4
 33  and     ad.my_attr5 = ud.my_attr5
 34  and     ad.my_attr6 = ud.my_attr6
 35  and     ad.my_attr7 = ud.my_attr7
 36  and     ad.my_attr8 = ud.my_attr8
 37  ;

Execution Plan
----------------------------------------------------------
Plan hash value: 2183901041

---------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                            |     8 |  2080 |    24   (5)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D6676_627FFB3 |       |       |            |          |
|   3 |    TABLE ACCESS FULL       | BIG_TABLE                  |     8 |   288 |     5   (0)| 00:00:01 |
|   4 |   LOAD AS SELECT           | SYS_TEMP_0FD9D6677_627FFB3 |       |       |            |          |
|   5 |    HASH UNIQUE             |                            |     8 |   192 |     3  (34)| 00:00:01 |
|   6 |     VIEW                   |                            |     8 |   192 |     2   (0)| 00:00:01 |
|   7 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D6676_627FFB3 |     8 |   288 |     2   (0)| 00:00:01 |
|*  8 |   HASH JOIN                |                            |     8 |  2080 |    16   (0)| 00:00:01 |
|   9 |    NESTED LOOPS OUTER      |                            |     8 |  1008 |    14   (0)| 00:00:01 |
|  10 |     VIEW                   |                            |     8 |   832 |     2   (0)| 00:00:01 |
|  11 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D6677_627FFB3 |     8 |   192 |     2   (0)| 00:00:01 |
|* 12 |     TABLE ACCESS FULL      | TYPES                      |     1 |    22 |     2   (0)| 00:00:01 |
|  13 |    VIEW                    |                            |     8 |  1072 |     2   (0)| 00:00:01 |
|  14 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6676_627FFB3 |     8 |   288 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access("MD"."ATT1"="DD"."ATT1" AND "MD"."ATT2"="DD"."ATT2" AND "MD"."ATT3"="DD"."ATT3"
              AND "MD"."ATT4"="DD"."ATT4" AND "MD"."ATT5"="DD"."ATT5" AND "MD"."ATT6"="DD"."ATT6" AND
              "MD"."ATT7"="DD"."ATT7" AND "MD"."ATT8"="DD"."ATT8")
  12 - filter("DD"."ATT1"=NVL("TY"."ATT1"(+),"DD"."ATT1") AND
              "DD"."ATT2"=NVL("TY"."ATT2"(+),"DD"."ATT2") AND "DD"."ATT3"=NVL("TY"."ATT3"(+),"DD"."ATT3") AND
              "DD"."ATT4"=NVL("TY"."ATT4"(+),"DD"."ATT4") AND "DD"."ATT5"=NVL("TY"."ATT5"(+),"DD"."ATT5") AND
              "DD"."ATT6"=NVL("TY"."ATT6"(+),"DD"."ATT6") AND "DD"."ATT7"=NVL("TY"."ATT7"(+),"DD"."ATT7") AND
              "DD"."ATT8"=NVL("TY"."ATT8"(+),"DD"."ATT8"))

Notice that the query rewrite did eliminate the MERGE JOIN CARTESIAN and replaced it with a nested loops hash join of the smaller table and the reduced result set of distinct attribute values. Since there were eight attribute values to match having a list of distinct combinations made Oracle’s job much easier.

This works because the number of distinct combinations of the searched columns was small compared to the entire set of table data; if the number of distinct combinations was small relative to the entire data set; if the distinct combinations approach the size of the total rows in the source table it’s likely this type of re-write won’t provide a better plan. But, when faced with a MERGE JOIN CARTESIAN and large data sets such a technique could ‘save the day’ by reducing the work Oracle must do to return the desired results.

See all articles by David Fitzjarrell

David Fitzjarrell

David Fitzjarrell has more than 20 years of administration experience with various releases of the Oracle DBMS. He has installed the Oracle software on many platforms, including UNIX, Windows and Linux, and monitored and tuned performance in those environments. He is knowledgeable in the traditional tools for performance tuning – the Oracle Wait Interface, Statspack, event 10046 and 10053 traces, tkprof, explain plan and autotrace – and has used these to great advantage at the U.S. Postal Service, American Airlines/SABRE, ConocoPhilips and SiriusXM Radio, among others, to increase throughput and improve the quality of the production system. He has also set up scripts to regularly monitor available space and set thresholds to notify DBAs of impending space shortages before they affect the production environment. These scripts generate data which can also used to trend database growth over time, aiding in capacity planning. He has used RMAN, Streams, RAC and Data Guard in Oracle installations to ensure full recoverability and failover capabilities as well as high availability, and has configured a 'cascading' set of DR databases using the primary DR databases as the source, managing the archivelog transfers manually and montoring, through scripts, the health of these secondary DR databases. He has also used ASM, ASMM and ASSM to improve performance and manage storage and shared memory.

Cartesian Joins in Oracle: The Nullable Column Problem

David Fitzjarrell

Company

Categories