Clustering for Indexes

There is nothing spectacular
about using indexes per se. However, on many occasions I have come across a
variety of SQL coders that never consider validating that the index they think
they are using is efficient or even being used at all. We can all put indexes
on the columns that we think will be required to satisfy individual queries,
but how do we know if they will ever be used. You see, if the underlying table
data is constructed, contains, or is ordered in a particular way, our indexes
may never be used. One of the factors around the use of an index is its
clustering factor and this is what this article is about.

Clustering defined

The dictionary.com
definition of clustering is "A group of the same or similar elements
gathered or occurring closely together." This is not too far from what we
mean by clustering for an index when relating it to databases. If you were to
extract a definition from the Oracle documentation, you would get to a concept
of Clustering Factor, which is defined as "the amount of order of the rows
in the table based on the values of the index." So what is a good
clustering factor as opposed to a bad clustering factor? As the clustering
factor approaches and reaches the number of blocks in the table, the index is
considered to have good clustering. You can quickly see that if an index block
points to a table data block, then all of the pointers in the index are
pointing to table data that is very closely related and thus we will get very
good response from using the index. On the other hand, poor or bad clustering
is very much the opposite. As the clustering factor approaches the number of
rows in the table, the worse our usability of the index becomes. This means
that the entries in each of the index blocks are in different blocks in the
table data blocks, our access to the data could not be sequential, and we loose
all ability to scan table data in order of the index. Figure 1 gives a good
picture of how good clustering is nicely ordered when accessing data blocks
from the index blocks. Figure 2 shows how the index to data blocks is random
for bad clustering.

FIGURE
1

Good clustering

FIGURE
2.

Bad clustering

Effects on performance

1.     
db file sequential &
scattered reads


Depending
on the index access and access to the table data that is signified by your SQL
execution plan, if the clustering factor is bad we will end up doing more reads
than we want to get to the data to produce the result set needed by the query.
These waits on db file reads have huge implications, not only to the current
SQL statement but also steal valuable resources from other SQL and taxing our
i/o subsystem.

2.     
Buffer Busy Waits
Directly related to high db
file sequential & scattered reads is the requirement to obtain buffers in
the buffer cache. The fact that we are reading many more blocks than we need,
and possibly re-reading, we will tax requests for buffers and thus incur
possible high wait times for this valuable resource. In addition, if you watch
this area and try to tune it, it is like trying to hit a moving target because
there tends to be no rhyme or reason to the random access for data.

3.     SQL execution path
Since the optimizer looks at the clustering factor
statistic to determine the execution path, if the clustering factor is bad,
Oracle will tend to do full table scans for result sets, as it may be less
costly to pull in just table data as opposed to reading index and data blocks.
All of this drives up the cost of the execution plan and time for the SQL to
run.

Structure changes for cluster performance

1.     
Order of columns in an index

This
part of getting clustering to work is very important. This "typically"
means that you should use the columns with the most selectivity first (A
primary key is very selective). In addition, you should validate the order of
the columns in your index along with the business rules and data heuristics.
This is the only way you will know if the index you build will at least satisfy
the types of queries requested of it.

2.     
Load your data in the order you
have set up the columns for the index(es). If you have more than one index on
an object, you should revert to the business rules and SLAs around the queries
that are most important. You can accomplish this by any method you like but it does
require a rebuild (reorganization) of the table. If you have ETL processes, try
to pre-sort the data as it will improve performance and limit the complaints
latter.

3.     
Add data needed in the table to
the index. While this is more of a hack than a fix, if you cannot get to
reorganize your data in the table this may be your only choice. You may also
have a situation where the order in the table is great for a set of indexes but
is totally out of order for another. Adding a column or two in these situations
is a life saver.

Where to look at clustering

DBA_INDEXES For all indexes in your database
DBA_IND_PARTITIONS For all index partitions in your database
DBA_IND_STATISTICS For showing optimizer statistics for all indexes in your database.
DBA_IND_SUBPARTITIONS For all index subpartitions in your database.

Having a
good clustered index on a table reduces the time to execute and resources
required to produce result sets. This is important as other queries may starve
for those resources. Asking yourself a few questions about the access paths and
data required to ask well-defined business questions puts you closer to
discovering the true indexes required within your databases. We all know that
this is not a perfect world and often we inherit bad designs. By looking at the
clustering factor, it only takes a moment, we can quickly gain insight into how
the data looks in the objects being queried and thus possible performance issues
we need to deal with.

»


See All Articles by Columnist
James Koopmann

James Koopmann
James Koopmann
James Koopmann has fourteen years of database design, development and performance tuning experience. In addition, he has extensive database administration experience in Oracle and other relational databases in production environments, specializing in performance tuning of database engines and SQL based applications. Koopmann is an accomplished author with several technical papers in various Oracle related publications such as Oracle Magazine, Oracle Professional and SQL>UPDATE_RMOUG. He is a featured author and database expert for DatabaseJournal, a member of the editorial review committee for Select Journal (The Magazine for the International Oracle Users Group), an Oracle Certified Professional DBA and noted speaker at local Oracle User Groups around the country.

Latest Articles