A fairly common question recently appeared in an Oracle-related newsgroup; the question was titled ‘index didn’t work’ and was presented thus: “a table TEST,includes two columns:c_1(number not null),c_2(varchar2(255));10 rows inserted;create independent index on each column,then i query:select c_1 from TEST,explain plan report:index full scan,that’s right.next,i query:select c_2 from TEST,explain plan report:table access full.why?”
Okay, that’s a bit difficult to read so let’s re-word it a bit:
A table, TEST, contains two columns, defined as follows:
c_1 number not null c_2 varchar2(255)
Ten rows are inserted into this table and single-column indexes are created. Querying the data with the following statement:
select c_1 from test;
produces an index full scan path. Changing the query to:
select c_2 from test;
produces a full table scan path. Both columns have an index; why isn’t the index for the second column used for the above query?
So why isn’t that index used? It all comes down to … [ominous musical interlude here] … NULLs.
You read correctly, NULLs.
In an Oracle database, entirely NULL keys are not indexed in conventional index structures (b-tree indexes); I consider a function-based index a special case of this, which can index NULL keys provided the function used ‘translates’ NULLs to a non-null value. Bitmap indexes, on the other hand, do index NULLs by default; unfortunately for the questioner the indexes created aren’t bitmap indexes. So, why are NULLs at fault here? Let’s look at the table definition again and see.
SQL> desc test
Name Null? Type
----------------------------------------- -------- ---------------------------
C_1 NOT NULL NUMBER
C_2 VARCHAR2(255)
SQL>
Column c_1 is declared not null, thus there can never be a NULL value in that field, THUS an index keyed on that column is guaranteed to contain all values of that column. Oracle can, and will, use only an index created against not null columns to return query results, if that index satisfies the select list as all values will be present; the associated table will never be accessed in such cases. Contrast that to column c_2, which is nullable. It doesn’t matter that the column doesn’t contain any NULL values when the index is created, the fact remains that the column CAN contain NULLs. This, then, prevents Oracle from using the index to retrieve the data from that column since it’s possible to miss any NULL values stored there. The only way to ensure Oracle returns all values from c_2 is to perform a full table scan.
Let’s look at an example of this in action:
SQL> create table test1(
2 c_1 number not null,
3 c_2 varchar2(255)
4 );
Table created.
SQL>
SQL> begin
2 for i in 1..10 loop
3 insert into test1
4 values (i, 'Test '¦¦i);
5 end loop;
6
7 commit;
8 end;
9 /
PL/SQL procedure successfully completed.
SQL>
SQL> create index test1_c_1
2 on test1(c_1);
Index created.
SQL>
SQL> create index test1_c_2
2 on test1(c_2);
Index created.
SQL>
SQL> set autotrace on
SQL>
SQL> select c_1 from test1;
C_1
----------
1
2
3
4
5
6
7
8
9
10
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2941126162
------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
------------------------------------------------------------------------------
0 SELECT STATEMENT 10 130 1 (0) 00:00:01
1 INDEX FULL SCAN TEST1_C_1 10 130 1 (0) 00:00:01
------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
5 recursive calls
0 db block gets
11 consistent gets
0 physical reads
0 redo size
284 bytes sent via SQL*Net to client
246 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
SQL>
SQL> select c_2 from test1;
C_2
--------------------------------------------------------------------------------
Test 1
Test 2
Test 3
Test 4
Test 5
Test 6
Test 7
Test 8
Test 9
Test 10
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3852271815
---------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
---------------------------------------------------------------------------
0 SELECT STATEMENT 10 1290 3 (0) 00:00:01
1 TABLE ACCESS FULL TEST1 10 1290 3 (0) 00:00:01
---------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
17 consistent gets
0 physical reads
0 redo size
344 bytes sent via SQL*Net to client
246 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
SQL>
Notice that even though no NULL values occur in c_2 the path is a full table scan; this is due, as explained earlier, to the column definition, at table creation, allowing NULL values to be stored. Notice, too, that the query:
select c_1 from test1;
accesses only the index on that column; at no time is the table ‘touched’ to return any data. There is no need, since all data values for that column are present in that index.
Let’s see if we can get an index full scan of the index for c_2; we’ll change the query and see what the optimizer does:
SQL> select c_2 from test1 where c_2 is not null;
C_2
------------------------------------------------------------------------------
Test 1
Test 10
Test 2
Test 3
Test 4
Test 5
Test 6
Test 7
Test 8
Test 9
10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 950550661
------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
------------------------------------------------------------------------------
0 SELECT STATEMENT 10 80 1 (0) 00:00:01
* 1 INDEX FULL SCAN TEST1_C_2 10 80 1 (0) 00:00:01
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("C_2" IS NOT NULL)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
343 bytes sent via SQL*Net to client
246 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
SQL>
Since Oracle need not return any NULLs stored in c_2, and since the index contains only non-NULL key values, Oracle can perform an index full scan to return the desired data. Of course in this case there are no NULLs stored in the table, making this possibly a bit confusing. We’ll now store a few NULL values in that column and run the queries again:
SQL> begin
2 for i in 11..20 loop
3 if mod(i,2) = 0 then
4 insert into test1 values (i, 'Test '¦¦i);
5 else
6 insert into test1 values(i, null);
7 end if;
8 end loop;
9 end;
10 /
PL/SQL procedure successfully completed.
SQL> commit;
Commit complete.
SQL> select c_2 from test1;
C_2
--------------------------------------------------------------------------------
Test 12
Test 14
Test 16
Test 18
Test 20
Test 1
C_2
--------------------------------------------------------------------------------
Test 2
Test 3
Test 4
Test 5
Test 6
Test 7
Test 8
Test 9
Test 10
20 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3852271815
---------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
---------------------------------------------------------------------------
0 SELECT STATEMENT 10 80 3 (0) 00:00:01
1 TABLE ACCESS FULL TEST1 10 80 3 (0) 00:00:01
---------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
9 consistent gets
4 physical reads
0 redo size
474 bytes sent via SQL*Net to client
253 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
20 rows processed
SQL> select c_2 from test1 where c_2 is not null;
C_2
--------------------------------------------------------------------------------
Test 1
Test 10
Test 12
Test 14
Test 16
Test 18
Test 2
Test 20
Test 3
Test 4
Test 5
C_2
--------------------------------------------------------------------------------
Test 6
Test 7
Test 8
Test 9
15 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 950550661
------------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
------------------------------------------------------------------------------
0 SELECT STATEMENT 10 80 1 (0) 00:00:01
* 1 INDEX FULL SCAN TEST1_C_2 10 80 1 (0) 00:00:01
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("C_2" IS NOT NULL)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
408 bytes sent via SQL*Net to client
246 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
15 rows processed
SQL>
The original query:
select c_2 from test1;
returns all values for the column, including the NULLs, requiring the table to be scanned. The second query, with the ‘is not null’ qualifier, returns 5 fewer rows but doesn’t touch the table at all, instead relying solely on the index to produce the desired result set.
Even…
select count(*) from test1;
…will use the index on the NOT NULL column, thus avoiding any table access:
SQL> select count(*) from test1;
COUNT(*)
----------
10
Execution Plan
----------------------------------------------------------
Plan hash value: 2402158148
----------------------------------------------------------------------
Id Operation Name Rows Cost (%CPU) Time
----------------------------------------------------------------------
0 SELECT STATEMENT 1 1 (0) 00:00:01
1 SORT AGGREGATE 1
2 INDEX FULL SCAN TEST1_C_1 10 1 (0) 00:00:01
----------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
11 consistent gets
0 physical reads
0 redo size
225 bytes sent via SQL*Net to client
246 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
Since, as explained before, the index on the NOT NULL column will list every value in that column it will also provide the count for every row in that table.
Let’s make the index on c_2 contain all of the values of c_2, NULL or not, by including a non-null component (in this case a 0):
SQL> drop index test1_c_2;
Index dropped.
SQL>
SQL> create index test1_c_2
2 on test1(c_2,0);
Index created.
SQL>
Now the plan accesses the index without the need for the “where c_2 is not null” condition:
SQL> select c_2 from test1;
C_2
-------------------------------------------------------------------------
Test 1
Test 10
Test 12
Test 14
Test 16
Test 18
Test 2
Test 20
Test 3
Test 4
Test 5
Test 6
Test 7
Test 8
Test 9
20 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2529630288
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20 | 2580 | 1 (0)| 00:00:01 |
| 1 | INDEX FULL SCAN | TEST1_C_2 | 20 | 2580 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
48 recursive calls
0 db block gets
20 consistent gets
0 physical reads
0 redo size
766 bytes sent via SQL*Net to client
427 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
20 rows processed
SQL>
Since the index key was not null (there is always a 0 in each index entry) the index could contain NULL column values for c_2. So Oracle can index NULL values in a b-tree index as long as the entire index key is not null.
So, the optimizer is doing the ‘right thing’, really. It just may be confusing when situations like this arise. Just remember that entirely NULL keys, in a traditional b-tree index, aren’t included and for a key composed of nullable columns, Oracle must scan the entire table to return all of the possible values. It isn’t a problem, it’s just ‘good business’.