Interesting questions can be posted in the Oracle database forums and one recent post generated a great deal of attention. A query and part of the execution plan were posted with the poster asking how to get rid of the table scan. The table has a primary key index so, on the face of it, it is interesting that the primary key wasn’t used. Of course to provide any sort of insight or solution the table and index definitions need to be provided. In a response from the original post the CREATE TABLE and CREATE INDEX statements were provided. Let’s take a look at these objects and see how this question was addressed. I am indebted to both Jonathan Lewis and Andrew Sayer for providing the various query re-writes offered here.
Looking at the definition for the ‘problem’ table (LF_HOTEL_TEMP) we find only two columns, both of which are in the primary key definition. The tables were created:
SQL> create table LF_HOTEL_TEMP
2 (
3 HOTEL_CODE VARCHAR2(4000),
4 SERVICE_ID NUMBER(3)
5 )
6 /
Table created.
SQL>
SQL> create table LF_TS_ROOMTYPE_PROPERTIES
2 (
3 HOTEL_CODE VARCHAR2(20),
4 ROOM_TYPE VARCHAR2(500),
5 BOARD_TYPE VARCHAR2(500),
6 ROOM_AMT FLOAT,
7 SERVICE_ID NUMBER,
8 CURRENCY_CODE VARCHAR2(10)
9 )
10 /
Table created.
SQL>
SQL> create table LF_HB_ROOMTYPE_PROPERTIES
2 (
3 HOTEL_CODE VARCHAR2(20),
4 ROOM_TYPE VARCHAR2(500),
5 BOARD_TYPE VARCHAR2(500),
6 ROOM_AMT FLOAT,
7 SERVICE_ID NUMBER,
8 CURRENCY_CODE VARCHAR2(10)
9 )
10 /
Table created.
SQL>
SQL> create table LF_GTA_ROOMTYPE_PROPERTIES
2 (
3 HOTEL_CODE VARCHAR2(20),
4 ROOM_TYPE VARCHAR2(500),
5 BOARD_TYPE VARCHAR2(500),
6 ROOM_AMT FLOAT,
7 SERVICE_ID NUMBER,
8 CURRENCY_CODE VARCHAR2(10)
9 )
10 /
Table created.
SQL>
Then the primary key and additional indexes were created:
SQL> alter table lf_hotel_temp
2 add constraint lf_hotel_temp_PK primary key (HOTEL_CODE,service_id)
3 /
Table altered.
SQL>
SQL> create index LF_hb_roomtype_prop_IDX on lf_hb_roomtype_properties (HOTEL_CODE)
2 /
Index created.
SQL>
SQL> create index LF_ts_roomtype_prop_IDX on lf_ts_roomtype_properties (HOTEL_CODE)
2 /
Index created.
SQL>
SQL> create index LF_gta_roomtype_prop_IDX on lf_gta_roomtype_properties (HOTEL_CODE)
2 /
Index created.
SQL>
The original poster reported 278,000 rows in the LF_HOTEL_TEMP table so data was generated to reproduce that number of rows. Next the remaining tables were populated so that none of the three remaining tables shared any data:
SQL> begin
2 for i in 1..278000 loop
3 insert into lf_hotel_temp
4 values(i, mod(i,999)+1);
5 end loop;
6
7 commit;
8 end;
9 /
PL/SQL procedure successfully completed.
SQL>
SQL> insert into lf_ts_roomtype_properties
2 (hotel_code)
3 select hotel_code from lf_hotel_temp
4 where service_id < 511;
142058 rows created.
SQL> insert into lf_ts_roomtype_properties
2 (hotel_code)
3 select hotel_code from lf_hotel_temp
4 where service_id between 313 and 642;
91740 rows created.
SQL> insert into lf_gta_roomtype_properties
2 (hotel_code)
3 select hotel_code from lf_hotel_temp
4 where service_id between 271 and 823;
153743 rows created.
SQL> insert into lf_hb_roomtype_properties
2 (hotel_code)
3 select hotel_code from lf_hotel_temp
4 where service_id between 571 and 999;
119262 rows created.
SQL> COMMIT;
Commit complete.
SQL>
Let’s look at the original query and its execution plan:
SQL> set autotrace on linesize 140
SQL>
SQL> -- original query
SQL> SELECT a.hotel_code
2 FROM lf_hotel_temp a
3 WHERE a.service_id = : p_service_id
4 AND (NOT EXISTS (SELECT *
5 FROM lf_ts_roomtype_properties b
6 WHERE a.hotel_code = b.hotel_code)
7 or NOT EXISTS (SELECT *
8 FROM lf_gta_roomtype_properties b
9 WHERE a.hotel_code = b.hotel_code)
10 or NOT EXISTS (SELECT *
11 FROM lf_hb_roomtype_properties b
12 WHERE a.hotel_code = b.hotel_code));
HOTEL_CODE
--------------------------------------------------------------------------------------------------------------------------------------------
1998
999
5994
...
243756
235764
238761
278 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 4111332730
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 33 | 512 (4)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | TABLE ACCESS FULL| LF_HOTEL_TEMP | 278 | 3058 | 94 (16)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | LF_TS_ROOMTYPE_PROP_IDX | 1 | 7 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | LF_GTA_ROOMTYPE_PROP_IDX | 1 | 7 | 1 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | LF_HB_ROOMTYPE_PROP_IDX | 1 | 7 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT 0 FROM "LF_TS_ROOMTYPE_PROPERTIES" "B" WHERE
"B"."HOTEL_CODE"=:B1) OR NOT EXISTS (SELECT 0 FROM "LF_GTA_ROOMTYPE_PROPERTIES" "B"
WHERE "B"."HOTEL_CODE"=:B2) OR NOT EXISTS (SELECT 0 FROM "LF_HB_ROOMTYPE_PROPERTIES"
"B" WHERE "B"."HOTEL_CODE"=:B3))
2 - filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
3 - access("B"."HOTEL_CODE"=:B1)
4 - access("B"."HOTEL_CODE"=:B1)
5 - access("B"."HOTEL_CODE"=:B1)
Statistics
----------------------------------------------------------
11 recursive calls
0 db block gets
2077 consistent gets
872 physical reads
0 redo size
4119 bytes sent via SQL*Net to client
570 bytes received via SQL*Net from client
20 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
278 rows processed
SQL>
All but LF_HOTEL_TEMP use the indexes to speed access to the data; this may be due to the fact that LF_HOTEL_TEMP is a two-column table and the primary key index will be larger than the table since it contains the table data and a rowid for each row. The main idea illustrated by this problem involves being able to find a logically equivalent way of writing the SQL. Jonathan Lewis decided that set operations might produce a ‘better’ plan and re-wrote the query as shown below, producing a plan accessing LF_HOTEL_TEMP by the primary key index:
SQL> -- JL solution
SQL> var p_service_id number
SQL> exec : p_service_id := 1
PL/SQL procedure successfully completed.
SQL>
SQL> select /*+ dynamic_sampling(0) */
2 hotel_code
3 from lf_hotel_temp
4 where service_id = :p_service_id
5 minus (
6 select hotel_code
7 from lf_ts_roomtype_properties
8 where hotel_code is not null
9 intersect
10 select hotel_code
11 from lf_gta_roomtype_properties
12 where hotel_code is not null
13 intersect
14 select hotel_code
15 from lf_hb_roomtype_properties
16 where hotel_code is not null
17 )
18 ;
HOTEL_CODE
--------------------------------------------------------------------------------------------------------------------------------------------
100899
101898
102897
...
999
9990
99900
278 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 775735246
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2159 | 4 (100)| 00:00:01 |
| 1 | MINUS | | | | | |
| 2 | SORT UNIQUE NOSORT | | 1 | 2015 | 1 (100)| 00:00:01 |
|* 3 | INDEX FULL SCAN | LF_HOTEL_TEMP_PK | 1 | 2015 | 0 (0)| 00:00:01 |
| 4 | INTERSECTION | | | | | |
| 5 | INTERSECTION | | | | | |
| 6 | SORT UNIQUE NOSORT| | 4 | 48 | 1 (100)| 00:00:01 |
|* 7 | INDEX FULL SCAN | LF_TS_ROOMTYPE_PROP_IDX | 4 | 48 | 0 (0)| 00:00:01 |
| 8 | SORT UNIQUE NOSORT| | 4 | 48 | 1 (100)| 00:00:01 |
|* 9 | INDEX FULL SCAN | LF_GTA_ROOMTYPE_PROP_IDX | 4 | 48 | 0 (0)| 00:00:01 |
| 10 | SORT UNIQUE NOSORT | | 4 | 48 | 1 (100)| 00:00:01 |
|* 11 | INDEX FULL SCAN | LF_HB_ROOMTYPE_PROP_IDX | 4 | 48 | 0 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
filter("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
7 - filter("HOTEL_CODE" IS NOT NULL)
9 - filter("HOTEL_CODE" IS NOT NULL)
11 - filter("HOTEL_CODE" IS NOT NULL)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
5996 consistent gets
2416 physical reads
234680 redo size
4119 bytes sent via SQL*Net to client
570 bytes received via SQL*Net from client
20 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
278 rows processed
SQL>
Andrew Sayer then provided another pair of re-writes, shown below with the plans returned by Oracle 12.1.0.2:
SQL> -- Andrew Sayer
SQL>
SQL> with existance_check as (select hotel_code
2 from (select distinct hotel_code
3 from lf_ts_roomtype_properties
4 where hotel_code is not null
5 union all
6 select distinct hotel_code
7 from lf_gta_roomtype_properties
8 where hotel_code is not null
9 union all
10 select distinct hotel_code
11 from lf_hb_roomtype_properties
12 where hotel_code is not null
13 )
14 group by hotel_code
15 having count(*) = 3
16 )
17 SELECT a.hotel_code
18 FROM lf_hotel_temp a
19 WHERE a.service_id = : p_service_id
20 AND NOT EXISTS (SELECT *
21 FROM existance_check b
22 WHERE a.hotel_code = b.hotel_code)
23 /
HOTEL_CODE
--------------------------------------------------------------------------------------------------------------------------------------------
252747
131868
118881
...
73926
197802
192807
278 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1887592732
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2899 | 5738K| | 3454 (9)| 00:00:01 |
|* 1 | HASH JOIN ANTI | | 2899 | 5738K| 5744K| 3454 (9)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | LF_HOTEL_TEMP | 2899 | 5704K| | 98 (20)| 00:00:01 |
| 3 | VIEW | | 611K| 7162K| | 2843 (9)| 00:00:01 |
|* 4 | FILTER | | | | | | |
| 5 | HASH GROUP BY | | 611K| 7162K| | 2843 (9)| 00:00:01 |
| 6 | VIEW | | 611K| 7162K| | 2736 (5)| 00:00:01 |
| 7 | UNION-ALL | | | | | | |
| 8 | HASH UNIQUE | | 312K| 3660K| 6136K| 1387 (5)| 00:00:01 |
|* 9 | TABLE ACCESS FULL| LF_TS_ROOMTYPE_PROPERTIES | 312K| 3660K| | 74 (15)| 00:00:01 |
| 10 | HASH UNIQUE | | 176K| 2063K| 3464K| 796 (5)| 00:00:01 |
|* 11 | TABLE ACCESS FULL| LF_GTA_ROOMTYPE_PROPERTIES | 176K| 2063K| | 55 (13)| 00:00:01 |
| 12 | HASH UNIQUE | | 122K| 1438K| 2416K| 553 (5)| 00:00:01 |
|* 13 | TABLE ACCESS FULL| LF_HB_ROOMTYPE_PROPERTIES | 122K| 1438K| | 37 (14)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."HOTEL_CODE"="B"."HOTEL_CODE")
2 - filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
4 - filter(COUNT(*)=3)
9 - filter("HOTEL_CODE" IS NOT NULL)
11 - filter("HOTEL_CODE" IS NOT NULL)
13 - filter("HOTEL_CODE" IS NOT NULL)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Statistics
----------------------------------------------------------
37 recursive calls
0 db block gets
2664 consistent gets
1056 physical reads
37440 redo size
4119 bytes sent via SQL*Net to client
570 bytes received via SQL*Net from client
20 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
278 rows processed
SQL>
SQL> with existance_check as (select hotel_code
2 from lf_ts_roomtype_properties
3 where hotel_code is not null
4 intersect
5 select hotel_code
6 from lf_gta_roomtype_properties
7 where hotel_code is not null
8 intersect
9 select hotel_code
10 from lf_hb_roomtype_properties
11 where hotel_code is not null
12 )
13 SELECT a.hotel_code
14 FROM lf_hotel_temp a
15 WHERE a.service_id = : p_service_id
16 AND NOT EXISTS (SELECT *
17 FROM existance_check b
18 WHERE a.hotel_code = b.hotel_code)
19 /
HOTEL_CODE
--------------------------------------------------------------------------------------------------------------------------------------------
252747
131868
118881
...
73926
197802
192807
278 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1347650198
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2899 | 5738K| | 3048 (6)| 00:00:01 |
|* 1 | HASH JOIN RIGHT ANTI | | 2899 | 5738K| 2880K| 3048 (6)| 00:00:01 |
| 2 | VIEW | | 122K| 1438K| | 2736 (5)| 00:00:01 |
| 3 | INTERSECTION | | | | | | |
| 4 | INTERSECTION | | | | | | |
| 5 | SORT UNIQUE | | 312K| 3660K| 6136K| | |
|* 6 | TABLE ACCESS FULL| LF_TS_ROOMTYPE_PROPERTIES | 312K| 3660K| | 74 (15)| 00:00:01 |
| 7 | SORT UNIQUE | | 176K| 2063K| 3464K| | |
|* 8 | TABLE ACCESS FULL| LF_GTA_ROOMTYPE_PROPERTIES | 176K| 2063K| | 55 (13)| 00:00:01 |
| 9 | SORT UNIQUE | | 122K| 1438K| 2416K| | |
|* 10 | TABLE ACCESS FULL | LF_HB_ROOMTYPE_PROPERTIES | 122K| 1438K| | 37 (14)| 00:00:01 |
|* 11 | TABLE ACCESS FULL | LF_HOTEL_TEMP | 2899 | 5704K| | 98 (20)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."HOTEL_CODE"="B"."HOTEL_CODE")
6 - filter("HOTEL_CODE" IS NOT NULL)
8 - filter("HOTEL_CODE" IS NOT NULL)
10 - filter("HOTEL_CODE" IS NOT NULL)
11 - filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Statistics
----------------------------------------------------------
14 recursive calls
0 db block gets
1848 consistent gets
0 physical reads
0 redo size
4119 bytes sent via SQL*Net to client
570 bytes received via SQL*Net from client
20 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
278 rows processed
SQL>
Four approaches, four different plans, but all generating the same result set. In the reported statistics from Jonathan Lewis’ query the redo size is much larger than from the other plans; however the plan for Jonathan’s query is the only one using an index on LF_HOTEL_TEMP. It is possible that very small data sets will produce other execution plans, with more of them using the indexes; however using a data set sized the same as the one the original poster is using makes for more ‘honest’ comparisons. In general, the set operations appear to be more efficient than the non-set operations.
Using set operations in a relational database may not be the most obvious of choices, but it is good to remember that SQL is, at its heart, a set-based language. When thinking about alternative ways to solve a problem the best way may not be the one that is most commonly used; this problem brings that into focus. Many people don’t think about database data in terms of sets, but, really, that is what a SQL query generates, a result set. Thinking about data that way may open up more efficient ways to return those sets.