It seems that one of the more confusing statistics in Oracle is one called the clustering factor. Associated with an index it’s actually dependent on the table data, more specifically the distance between ‘jumps’ for a given index key. A ‘jump’, to me, is the number of blocks between rows containing the given index key starting with the first block found containing that key. If that’s confusing don’t despair, this will be explained in detail.
Oracle, in its infinite wisdom, computes a clustering factor for an index based upon the key value and the scattering of the rows containing that key across the table data. Looking at a simple example let’s look at a simple non-unique index (indexes for which the clustering factor matters) and the associated table data. The example starts by building the table and loading the data:
SQL>
SQL> create table clust_fact(
2 id number,
3 nm varchar2(40),
4 dt date,
5 sq number);
Table created.
SQL>
SQL> begin
2 for i in 1..1000000 loop
3 insert into clust_fact(id,nm,dt,sq)
4 values(mod(i,77), 'Record '||i, sysdate - mod(i, 77), i);
5 end loop;
6
7 commit;
8 end;
9 /
PL/SQL procedure successfully completed.
SQL>
Next, build the index and generate the statistics:
SQL>
SQL> create index clust_fact_idx on clust_fact(id);
Index created.
SQL>
SQL> exec dbms_stats.gather_table_stats(user, 'CLUST_FACT', cascade=>true);
PL/SQL procedure successfully completed.
SQL>
Query the clustering factor computed by Oracle:
SQL> select index_name, clustering_factor
2 from user_indexes
3 where table_name = 'CLUST_FACT';
INDEX_NAME CLUSTERING_FACTOR
------------------------------ -----------------
CLUST_FACT_IDX 370731
SQL>
The clustering factor provides a representation of how far Oracle needs to ‘jump’, on average, between blocks containing the index key value. To understand this a bit better let’s look at the code Oracle actually uses to compute the clustering factor and explain what it does:
SQL>
SQL> --
SQL> -- 1 block limit (Oracle default)
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),1) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
370731
SQL>
Oracle uses a function named sys_op_countchg, which takes two parameters, the first 15 bytes of the rowid and the number of blocks to traverse to trigger the count to increase; by default Oracle chooses to increment the clustering factor counter every time the block address changes. If three rows, for example, exist in the current block the counter does not increment; if the next three rows are in a different block the counter is incremented by 1 (Oracle doesn’t change blocks for the next two reads after the change). This keeps going throughout the table to compute the final clustering factor. Knowing the function used by Oracle and its parameters it is possible to modify the call to accept larger values for the number of block changes before the clustering factor counter is incremented. To allow Oracle to consider 3 blocks ‘close enough’ to not change the counter the query is modified as follows:
SQL>
SQL> --
SQL> -- 3 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),3) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
370731
SQL>
Given how the values were computed for the ID column the clustering factor doesn’t change when we increase the block change count Oracle considers to be within range of the starting block address. Possible values for this second parameter are 1, 3, 5, 9 and 16. Going through the list of possible values it’s obvious with the current data loading the clustering factor won’t improve no matter which value we choose:
SQL>
SQL> --
SQL> -- 5 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),5) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
370731
SQL>
SQL> --
SQL> -- 9 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),9) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
370731
SQL>
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),16) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
370731
SQL>
Since how the table data is ordered affects the clustering factor the table is reloaded with data ordered using the ID column:
SQL>
SQL> create table clust_tmp as select * from clust_fact order by id;
Table created.
SQL>
SQL> truncate table clust_fact;
Table truncated.
SQL>
SQL> insert into clust_fact select * from clust_tmp;
1000000 rows created.
SQL>
SQL> commit;
Commit complete.
SQL>
SQL> exec dbms_stats.gather_table_stats(user, 'CLUST_FACT', cascade=>true);
PL/SQL procedure successfully completed.
SQL>
Querying the clustering factor reveals it has significantly decreased since the keys are much closer together:
SQL>
SQL> select index_name, clustering_factor
2 from user_indexes
3 where table_name = 'CLUST_FACT';
INDEX_NAME CLUSTERING_FACTOR
------------------------------ -----------------
CLUST_FACT_IDX 4889
SQL>
SQL> --
SQL> -- 1 block limit (Oracle default)
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),1) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
4889
SQL>
SQL> --
SQL> -- 3 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),3) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
4888
SQL>
SQL> --
SQL> -- 5 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),5) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
4888
SQL>
SQL> --
SQL> -- 9 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),9) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
4885
SQL>
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),16) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
4881
SQL>
It is worth noting that the clustering factor slightly decreases with larger values for the number of blocks Oracle can skip before updating the change count. The changes are not huge, and only differ by 8 from the value Oracle calculates by default to setting the ‘blocks to skip’ parameter to 16, its highest possible value. Since the original load created the SQ values in sequence the table is reloaded with the SQ values in order and a new concatenated index using both the ID and SQ columns is created. This is done to show how the clustering factor behaves for other indexes when it’s ‘optimized’ for one index:
SQL>
SQL> drop table clust_tmp purge;
Table dropped.
SQL>
SQL> create table clust_tmp as select * from clust_fact order by sq;
Table created.
SQL>
SQL> truncate table clust_fact;
Table truncated.
SQL>
SQL> insert into clust_fact select * from clust_tmp;
1000000 rows created.
SQL>
SQL> commit;
Commit complete.
SQL>
SQL> create index clust_fact_idx2 on clust_fact(id, sq);
Index created.
SQL>
SQL> exec dbms_stats.gather_table_stats(user, 'CLUST_FACT', cascade=>true);
PL/SQL procedure successfully completed.
SQL>
Querying the clustering factor the value for the original table load again appears and, again, doesn’t change even when the block skip parameter to the sys_op_countchg function is increased through the available values:
SQL>
SQL> select index_name, clustering_factor
2 from user_indexes
3 where table_name = 'CLUST_FACT';
INDEX_NAME CLUSTERING_FACTOR
------------------------------ -----------------
CLUST_FACT_IDX2 370731
CLUST_FACT_IDX 370731
SQL>
SQL> --
SQL> -- 1 block limit (Oracle default)
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),1) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
370731
SQL>
SQL> --
SQL> -- 3 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),3) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
370731
SQL>
SQL> --
SQL> -- 5 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),5) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
370731
SQL>
SQL> --
SQL> -- 9 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),9) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
370731
SQL>
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),16) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
370731
SQL>
SQL> --
SQL> -- 1 block limit (Oracle default)
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX2") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX2") */
4 sys_op_countchg(substrb(t.rowid,1,15),1) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null
7 and "SQ" is not null;
CLF
----------
370731
SQL>
SQL> --
SQL> -- 3 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX2") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX2") */
4 sys_op_countchg(substrb(t.rowid,1,15),3) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null
7 and "SQ" is not null;
CLF
----------
370731
SQL>
SQL> --
SQL> -- 5 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX2") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX2") */
4 sys_op_countchg(substrb(t.rowid,1,15),5) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null
7 and "SQ" is not null;
CLF
----------
370731
SQL>
SQL> --
SQL> -- 9 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX2") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX2") */
4 sys_op_countchg(substrb(t.rowid,1,15),9) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null
7 and "SQ" is not null;
CLF
----------
370731
SQL>
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX2") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX2") */
4 sys_op_countchg(substrb(t.rowid,1,15),16) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null
7 and "SQ" is not null;
CLF
----------
370731
SQL>
Ordering the data by ID shows that for both indexes the clustering factor decreases, since the indexes both include the ID column. The presence of the SQ column causes the second index clustering factor to be greater when the default parameters are used; interestingly as the block skip count increases for the second index the clustering factor decreases to that of the single-column index:
SQL>
SQL> drop table clust_tmp purge;
Table dropped.
SQL>
SQL> create table clust_tmp as select * from clust_fact order by id;
Table created.
SQL>
SQL> truncate table clust_fact;
Table truncated.
SQL>
SQL> insert into clust_fact select * from clust_tmp;
1000000 rows created.
SQL>
SQL> commit;
Commit complete.
SQL>
SQL> exec dbms_stats.gather_table_stats(user, 'CLUST_FACT', cascade=>true);
PL/SQL procedure successfully completed.
SQL>
SQL> select index_name, clustering_factor
2 from user_indexes
3 where table_name = 'CLUST_FACT';
INDEX_NAME CLUSTERING_FACTOR
------------------------------ -----------------
CLUST_FACT_IDX2 75994
CLUST_FACT_IDX 4889
SQL>
SQL> --
SQL> -- 1 block limit (Oracle default)
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),1) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
4889
SQL>
SQL> --
SQL> -- 3 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),3) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
4889
SQL>
SQL> --
SQL> -- 5 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),5) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
4889
SQL>
SQL> --
SQL> -- 9 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),9) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
4886
SQL>
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX") */
4 sys_op_countchg(substrb(t.rowid,1,15),16) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null;
CLF
----------
4882
SQL>
SQL> --
SQL> -- 1 block limit (Oracle default)
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX2") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX2") */
4 sys_op_countchg(substrb(t.rowid,1,15),1) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null
7 and "SQ" is not null;
CLF
----------
75994
SQL>
SQL> --
SQL> -- 3 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX2") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX2") */
4 sys_op_countchg(substrb(t.rowid,1,15),3) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null
7 and "SQ" is not null;
CLF
----------
4967
SQL>
SQL> --
SQL> -- 5 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX2") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX2") */
4 sys_op_countchg(substrb(t.rowid,1,15),5) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null
7 and "SQ" is not null;
CLF
----------
4861
SQL>
SQL> --
SQL> -- 9 block limit
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX2") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX2") */
4 sys_op_countchg(substrb(t.rowid,1,15),9) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null
7 and "SQ" is not null;
CLF
----------
4861
SQL>
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX2") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX2") */
4 sys_op_countchg(substrb(t.rowid,1,15),16) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null
7 and "SQ" is not null;
CLF
----------
4861
SQL>
For multiple indexes on the same table, where no column is common between them, the clustering factor can skyrocket for some indexes when the table data is ordered to favor one particular index. That example wasn’t run and is left for you to investigate.
Although associated with an index, the clustering factor is affected by the table data so no series of index rebuilds will ‘fix’ a ‘bad’ value. The table data needs to be ordered to favor an index (or indexes if they contain the column used to order the table data). This, of course, improves performance for the positively affected indexes and, unfortunately, adversely affects other indexes created against that same table. Care must be taken when deciding whether or not to order the table data; ‘reasonable’ performance metrics must be decided upon before embarking on such a change, and it should be done in test or dev before considering it in production. It may be that overall ‘acceptable’ performance across all of the indexes is better than ‘outstanding’ performance for one index that causes ‘terrible’ performance for all others. It’s not a simple decision and needs to be carefully considered.
If you want to implement the modified clustering factor computed by modifying the query Oracle uses to generate this statistic you will need to manually update the index statistics using the DBMS_STATS package using one of the SET_INDEX_STATS procedures:
SQL>
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL>
SQL> column clf new_value cl_f
SQL>
SQL> select /*+ no_parallel_index(t, "CLUST_FACT_IDX2") dbms_stats
2 cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
3 no_substrb_pad no_expand index(t,"CLUST_FACT_IDX2") */
4 sys_op_countchg(substrb(t.rowid,1,15),16) as clf
5 from
6 "BING"."CLUST_FACT" t where "ID" is not null
7 and "SQ" is not null;
CLF
----------
4861
SQL> exec dbms_stats.set_index_stats(ownname => 'BING', indname => 'CLUST_FACT_IDX2', clstfct => &cl_f);
PL/SQL procedure successfully completed.
SQL>
Using the new_value functionality in SQL*Plus saves the computed value into a reusable variable to pass to procedures and functions that need it. The above code would set the clustering factor for the CLUST_FACT_IDX2 to 4861, replacing the Oracle-calculated value of 75994. Again, care must be taken when undertaking such a change since you cannot predict the resulting performance change.
The clustering factor determines which index Oracle will choose for a query against a given table. Changing the clustering factor will influence the CBO calculations sometimes for the better, sometimes for the worse. Plan carefully and test, test, test before second-guessing Oracle.