Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

Oracle

Posted January 10, 2013

A Tale of Two Indexes

By David Fitzjarrell

A fairly common question recently appeared in an Oracle-related newsgroup; the question was titled 'index didn't work' and was presented thus: "a table TEST,includes two columns:c_1(number not null),c_2(varchar2(255));10 rows inserted;create independent index on each column,then i query:select c_1 from TEST,explain plan report:index full scan,that's right.next,i query:select c_2 from TEST,explain plan report:table access full.why?"

Okay, that's a bit difficult to read so let's re-word it a bit:

A table, TEST, contains two columns, defined as follows:

c_1 number not null
c_2 varchar2(255)

Ten rows are inserted into this table and single-column indexes are created. Querying the data with the following statement:

select c_1 from test;

produces an index full scan path. Changing the query to:

select c_2 from test;

produces a full table scan path. Both columns have an index; why isn't the index for the second column used for the above query?

So why isn't that index used? It all comes down to ... [ominous musical interlude here] ... NULLs.

You read correctly, NULLs.

In an Oracle database, entirely NULL keys are not indexed in conventional index structures (b-tree indexes); I consider a function-based index a special case of this, which can index NULL keys provided the function used 'translates' NULLs to a non-null value. Bitmap indexes, on the other hand, do index NULLs by default; unfortunately for the questioner the indexes created aren't bitmap indexes. So, why are NULLs at fault here? Let's look at the table definition again and see.

SQL> desc test
 Name                                      Null?    Type
 ----------------------------------------- -------- ---------------------------
 C_1                                       NOT NULL NUMBER
 C_2                                                VARCHAR2(255)

SQL>

Column c_1 is declared not null, thus there can never be a NULL value in that field, THUS an index keyed on that column is guaranteed to contain all values of that column. Oracle can, and will, use only an index created against not null columns to return query results, if that index satisfies the select list as all values will be present; the associated table will never be accessed in such cases. Contrast that to column c_2, which is nullable. It doesn't matter that the column doesn't contain any NULL values when the index is created, the fact remains that the column CAN contain NULLs. This, then, prevents Oracle from using the index to retrieve the data from that column since it's possible to miss any NULL values stored there. The only way to ensure Oracle returns all values from c_2 is to perform a full table scan.

Let's look at an example of this in action:

SQL> create table test1(
  2        c_1 number not null,
  3        c_2 varchar2(255)
  4  );

Table created.

SQL>
SQL> begin
  2        for i in 1..10 loop
  3         insert into test1
  4         values (i, 'Test '¦¦i);
  5        end loop;
  6
  7        commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL>
SQL> create index test1_c_1
  2  on test1(c_1);

Index created.

SQL>
SQL> create index test1_c_2
  2  on test1(c_2);

Index created.

SQL>
SQL> set autotrace on
SQL>
SQL> select c_1 from test1;

       C_1
----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10

10 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2941126162

------------------------------------------------------------------------------
 Id   Operation         Name       Rows   Bytes  Cost (%CPU) Time
------------------------------------------------------------------------------
   0  SELECT STATEMENT                10    130      1   (0) 00:00:01
   1   INDEX FULL SCAN  TEST1_C_1     10    130      1   (0) 00:00:01
------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
        284  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

SQL>
SQL> select c_2 from test1;

C_2
--------------------------------------------------------------------------------
Test 1
Test 2
Test 3
Test 4
Test 5
Test 6
Test 7
Test 8
Test 9
Test 10

10 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3852271815

---------------------------------------------------------------------------
 Id   Operation          Name   Rows   Bytes  Cost (%CPU) Time
---------------------------------------------------------------------------
   0  SELECT STATEMENT             10   1290      3   (0) 00:00:01
   1   TABLE ACCESS FULL TEST1     10   1290      3   (0) 00:00:01
---------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
         17  consistent gets
          0  physical reads
          0  redo size
        344  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

SQL>

Notice that even though no NULL values occur in c_2 the path is a full table scan; this is due, as explained earlier, to the column definition, at table creation, allowing NULL values to be stored. Notice, too, that the query:

select c_1 from test1;

accesses only the index on that column; at no time is the table 'touched' to return any data. There is no need, since all data values for that column are present in that index.

Let's see if we can get an index full scan of the index for c_2; we'll change the query and see what the optimizer does:

SQL> select c_2 from test1 where c_2 is not null;

C_2
------------------------------------------------------------------------------
Test 1
Test 10
Test 2
Test 3
Test 4
Test 5
Test 6
Test 7
Test 8
Test 9

10 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 950550661

------------------------------------------------------------------------------
 Id   Operation         Name       Rows   Bytes  Cost (%CPU) Time
------------------------------------------------------------------------------
   0  SELECT STATEMENT                10     80      1   (0) 00:00:01
*  1   INDEX FULL SCAN  TEST1_C_2     10     80      1   (0) 00:00:01
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("C_2" IS NOT NULL)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        343  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

SQL>

Since Oracle need not return any NULLs stored in c_2, and since the index contains only non-NULL key values, Oracle can perform an index full scan to return the desired data. Of course in this case there are no NULLs stored in the table, making this possibly a bit confusing. We'll now store a few NULL values in that column and run the queries again:

SQL> begin
  2     for i in 11..20 loop
  3             if mod(i,2) = 0 then
  4                     insert into test1 values (i, 'Test '¦¦i);
  5             else
  6                     insert into test1 values(i, null);
  7             end if;
  8     end loop;
  9  end;
 10  /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL> select c_2 from test1;

C_2
--------------------------------------------------------------------------------

Test 12

Test 14

Test 16

Test 18

Test 20
Test 1

C_2
--------------------------------------------------------------------------------
Test 2
Test 3
Test 4
Test 5
Test 6
Test 7
Test 8
Test 9
Test 10

20 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3852271815

---------------------------------------------------------------------------
 Id   Operation          Name   Rows   Bytes  Cost (%CPU) Time
---------------------------------------------------------------------------
   0  SELECT STATEMENT             10     80      3   (0) 00:00:01
   1   TABLE ACCESS FULL TEST1     10     80      3   (0) 00:00:01
---------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          9  consistent gets
          4  physical reads
          0  redo size
        474  bytes sent via SQL*Net to client
        253  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

SQL> select c_2 from test1 where c_2 is not null;

C_2
--------------------------------------------------------------------------------
Test 1
Test 10
Test 12
Test 14
Test 16
Test 18
Test 2
Test 20
Test 3
Test 4
Test 5

C_2
--------------------------------------------------------------------------------
Test 6
Test 7
Test 8
Test 9

15 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 950550661

------------------------------------------------------------------------------
 Id   Operation         Name       Rows   Bytes  Cost (%CPU) Time
------------------------------------------------------------------------------
   0  SELECT STATEMENT                10     80      1   (0) 00:00:01
*  1   INDEX FULL SCAN  TEST1_C_2     10     80      1   (0) 00:00:01
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("C_2" IS NOT NULL)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        408  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         15  rows processed

SQL>

The original query:

select c_2 from test1;

returns all values for the column, including the NULLs, requiring the table to be scanned. The second query, with the 'is not null' qualifier, returns 5 fewer rows but doesn't touch the table at all, instead relying solely on the index to produce the desired result set.

Even...

select count(*) from test1;

...will use the index on the NOT NULL column, thus avoiding any table access:

SQL> select count(*) from test1;

  COUNT(*)
----------
        10


Execution Plan
----------------------------------------------------------
Plan hash value: 2402158148

----------------------------------------------------------------------
 Id   Operation         Name       Rows   Cost (%CPU) Time    
----------------------------------------------------------------------
   0  SELECT STATEMENT                 1      1   (0) 00:00:01
   1   SORT AGGREGATE                  1                      
   2    INDEX FULL SCAN TEST1_C_1     10      1   (0) 00:00:01
----------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
        225  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

Since, as explained before, the index on the NOT NULL column will list every value in that column it will also provide the count for every row in that table.

Let's make the index on c_2 contain all of the values of c_2, NULL or not, by including a non-null component (in this case a 0):

SQL> drop index test1_c_2;

Index dropped.

SQL> 
SQL> create index test1_c_2
  2  on test1(c_2,0);

Index created.

SQL>

Now the plan accesses the index without the need for the "where c_2 is not null" condition:

SQL> select c_2 from test1;

C_2
-------------------------------------------------------------------------
Test 1
Test 10
Test 12
Test 14
Test 16
Test 18
Test 2
Test 20
Test 3
Test 4
Test 5
Test 6
Test 7
Test 8
Test 9






20 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2529630288

------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |    20 |  2580 |     1   (0)| 00:00:01 |
|   1 |  INDEX FULL SCAN | TEST1_C_2 |    20 |  2580 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
         48  recursive calls
          0  db block gets
         20  consistent gets
          0  physical reads
          0  redo size
        766  bytes sent via SQL*Net to client
        427  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

SQL> 

Since the index key was not null (there is always a 0 in each index entry) the index could contain NULL column values for c_2. So Oracle can index NULL values in a b-tree index as long as the entire index key is not null.

So, the optimizer is doing the 'right thing', really. It just may be confusing when situations like this arise. Just remember that entirely NULL keys, in a traditional b-tree index, aren't included and for a key composed of nullable columns, Oracle must scan the entire table to return all of the possible values. It isn't a problem, it's just 'good business'.

See all articles by David Fitzjarrell



Oracle Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM