A Tale of Two Indexes

A fairly common question recently appeared in an Oracle-related newsgroup; the question was titled ‘index didn’t work’ and was presented thus: “a table TEST,includes two columns:c_1(number not null),c_2(varchar2(255));10 rows inserted;create independent index on each column,then i query:select c_1 from TEST,explain plan report:index full scan,that’s right.next,i query:select c_2 from TEST,explain plan report:table access full.why?”

Okay, that’s a bit difficult to read so let’s re-word it a bit:

A table, TEST, contains two columns, defined as follows:

c_1 number not null
c_2 varchar2(255)

Ten rows are inserted into this table and single-column indexes are created. Querying the data with the following statement:

select c_1 from test;

produces an index full scan path. Changing the query to:

select c_2 from test;

produces a full table scan path. Both columns have an index; why isn’t the index for the second column used for the above query?

So why isn’t that index used? It all comes down to … [ominous musical interlude here] … NULLs.

You read correctly, NULLs.

In an Oracle database, entirely NULL keys are not indexed in conventional index structures (b-tree indexes); I consider a function-based index a special case of this, which can index NULL keys provided the function used ‘translates’ NULLs to a non-null value. Bitmap indexes, on the other hand, do index NULLs by default; unfortunately for the questioner the indexes created aren’t bitmap indexes. So, why are NULLs at fault here? Let’s look at the table definition again and see.

SQL> desc test
 Name                                      Null?    Type
 ----------------------------------------- -------- ---------------------------
 C_1                                       NOT NULL NUMBER
 C_2                                                VARCHAR2(255)

SQL>

Column c_1 is declared not null, thus there can never be a NULL value in that field, THUS an index keyed on that column is guaranteed to contain all values of that column. Oracle can, and will, use only an index created against not null columns to return query results, if that index satisfies the select list as all values will be present; the associated table will never be accessed in such cases. Contrast that to column c_2, which is nullable. It doesn’t matter that the column doesn’t contain any NULL values when the index is created, the fact remains that the column CAN contain NULLs. This, then, prevents Oracle from using the index to retrieve the data from that column since it’s possible to miss any NULL values stored there. The only way to ensure Oracle returns all values from c_2 is to perform a full table scan.

Let’s look at an example of this in action:

SQL> create table test1(
  2        c_1 number not null,
  3        c_2 varchar2(255)
  4  );

Table created.

SQL>
SQL> begin
  2        for i in 1..10 loop
  3         insert into test1
  4         values (i, 'Test '¦¦i);
  5        end loop;
  6
  7        commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL>
SQL> create index test1_c_1
  2  on test1(c_1);

Index created.

SQL>
SQL> create index test1_c_2
  2  on test1(c_2);

Index created.

SQL>
SQL> set autotrace on
SQL>
SQL> select c_1 from test1;

       C_1
----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10

10 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2941126162

------------------------------------------------------------------------------
 Id   Operation         Name       Rows   Bytes  Cost (%CPU) Time
------------------------------------------------------------------------------
   0  SELECT STATEMENT                10    130      1   (0) 00:00:01
   1   INDEX FULL SCAN  TEST1_C_1     10    130      1   (0) 00:00:01
------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
        284  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

SQL>
SQL> select c_2 from test1;

C_2
--------------------------------------------------------------------------------
Test 1
Test 2
Test 3
Test 4
Test 5
Test 6
Test 7
Test 8
Test 9
Test 10

10 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3852271815

---------------------------------------------------------------------------
 Id   Operation          Name   Rows   Bytes  Cost (%CPU) Time
---------------------------------------------------------------------------
   0  SELECT STATEMENT             10   1290      3   (0) 00:00:01
   1   TABLE ACCESS FULL TEST1     10   1290      3   (0) 00:00:01
---------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
         17  consistent gets
          0  physical reads
          0  redo size
        344  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

SQL>

Notice that even though no NULL values occur in c_2 the path is a full table scan; this is due, as explained earlier, to the column definition, at table creation, allowing NULL values to be stored. Notice, too, that the query:

select c_1 from test1;

accesses only the index on that column; at no time is the table ‘touched’ to return any data. There is no need, since all data values for that column are present in that index.

Let’s see if we can get an index full scan of the index for c_2; we’ll change the query and see what the optimizer does:

SQL> select c_2 from test1 where c_2 is not null;

C_2
------------------------------------------------------------------------------
Test 1
Test 10
Test 2
Test 3
Test 4
Test 5
Test 6
Test 7
Test 8
Test 9

10 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 950550661

------------------------------------------------------------------------------
 Id   Operation         Name       Rows   Bytes  Cost (%CPU) Time
------------------------------------------------------------------------------
   0  SELECT STATEMENT                10     80      1   (0) 00:00:01
*  1   INDEX FULL SCAN  TEST1_C_2     10     80      1   (0) 00:00:01
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("C_2" IS NOT NULL)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        343  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

SQL>

Since Oracle need not return any NULLs stored in c_2, and since the index contains only non-NULL key values, Oracle can perform an index full scan to return the desired data. Of course in this case there are no NULLs stored in the table, making this possibly a bit confusing. We’ll now store a few NULL values in that column and run the queries again:

SQL> begin
  2     for i in 11..20 loop
  3             if mod(i,2) = 0 then
  4                     insert into test1 values (i, 'Test '¦¦i);
  5             else
  6                     insert into test1 values(i, null);
  7             end if;
  8     end loop;
  9  end;
 10  /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL> select c_2 from test1;

C_2
--------------------------------------------------------------------------------

Test 12

Test 14

Test 16

Test 18

Test 20
Test 1

C_2
--------------------------------------------------------------------------------
Test 2
Test 3
Test 4
Test 5
Test 6
Test 7
Test 8
Test 9
Test 10

20 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3852271815

---------------------------------------------------------------------------
 Id   Operation          Name   Rows   Bytes  Cost (%CPU) Time
---------------------------------------------------------------------------
   0  SELECT STATEMENT             10     80      3   (0) 00:00:01
   1   TABLE ACCESS FULL TEST1     10     80      3   (0) 00:00:01
---------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          9  consistent gets
          4  physical reads
          0  redo size
        474  bytes sent via SQL*Net to client
        253  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

SQL> select c_2 from test1 where c_2 is not null;

C_2
--------------------------------------------------------------------------------
Test 1
Test 10
Test 12
Test 14
Test 16
Test 18
Test 2
Test 20
Test 3
Test 4
Test 5

C_2
--------------------------------------------------------------------------------
Test 6
Test 7
Test 8
Test 9

15 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 950550661

------------------------------------------------------------------------------
 Id   Operation         Name       Rows   Bytes  Cost (%CPU) Time
------------------------------------------------------------------------------
   0  SELECT STATEMENT                10     80      1   (0) 00:00:01
*  1   INDEX FULL SCAN  TEST1_C_2     10     80      1   (0) 00:00:01
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("C_2" IS NOT NULL)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        408  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         15  rows processed

SQL>

The original query:

select c_2 from test1;

returns all values for the column, including the NULLs, requiring the table to be scanned. The second query, with the ‘is not null’ qualifier, returns 5 fewer rows but doesn’t touch the table at all, instead relying solely on the index to produce the desired result set.

Even…

select count(*) from test1;

…will use the index on the NOT NULL column, thus avoiding any table access:

SQL> select count(*) from test1;

  COUNT(*)
----------
        10


Execution Plan
----------------------------------------------------------
Plan hash value: 2402158148

----------------------------------------------------------------------
 Id   Operation         Name       Rows   Cost (%CPU) Time    
----------------------------------------------------------------------
   0  SELECT STATEMENT                 1      1   (0) 00:00:01
   1   SORT AGGREGATE                  1                      
   2    INDEX FULL SCAN TEST1_C_1     10      1   (0) 00:00:01
----------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
        225  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

Since, as explained before, the index on the NOT NULL column will list every value in that column it will also provide the count for every row in that table.

Let’s make the index on c_2 contain all of the values of c_2, NULL or not, by including a non-null component (in this case a 0):

SQL> drop index test1_c_2;

Index dropped.

SQL> 
SQL> create index test1_c_2
  2  on test1(c_2,0);

Index created.

SQL>

Now the plan accesses the index without the need for the “where c_2 is not null” condition:

SQL> select c_2 from test1;

C_2
-------------------------------------------------------------------------
Test 1
Test 10
Test 12
Test 14
Test 16
Test 18
Test 2
Test 20
Test 3
Test 4
Test 5
Test 6
Test 7
Test 8
Test 9






20 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2529630288

------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |    20 |  2580 |     1   (0)| 00:00:01 |
|   1 |  INDEX FULL SCAN | TEST1_C_2 |    20 |  2580 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
         48  recursive calls
          0  db block gets
         20  consistent gets
          0  physical reads
          0  redo size
        766  bytes sent via SQL*Net to client
        427  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

SQL> 

Since the index key was not null (there is always a 0 in each index entry) the index could contain NULL column values for c_2. So Oracle can index NULL values in a b-tree index as long as the entire index key is not null.

So, the optimizer is doing the ‘right thing’, really. It just may be confusing when situations like this arise. Just remember that entirely NULL keys, in a traditional b-tree index, aren’t included and for a key composed of nullable columns, Oracle must scan the entire table to return all of the possible values. It isn’t a problem, it’s just ‘good business’.

See all articles by David Fitzjarrell

David Fitzjarrell
David Fitzjarrell
David Fitzjarrell has more than 20 years of administration experience with various releases of the Oracle DBMS. He has installed the Oracle software on many platforms, including UNIX, Windows and Linux, and monitored and tuned performance in those environments. He is knowledgeable in the traditional tools for performance tuning – the Oracle Wait Interface, Statspack, event 10046 and 10053 traces, tkprof, explain plan and autotrace – and has used these to great advantage at the U.S. Postal Service, American Airlines/SABRE, ConocoPhilips and SiriusXM Radio, among others, to increase throughput and improve the quality of the production system. He has also set up scripts to regularly monitor available space and set thresholds to notify DBAs of impending space shortages before they affect the production environment. These scripts generate data which can also used to trend database growth over time, aiding in capacity planning. He has used RMAN, Streams, RAC and Data Guard in Oracle installations to ensure full recoverability and failover capabilities as well as high availability, and has configured a 'cascading' set of DR databases using the primary DR databases as the source, managing the archivelog transfers manually and montoring, through scripts, the health of these secondary DR databases. He has also used ASM, ASMM and ASSM to improve performance and manage storage and shared memory.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles