Oracle Optimizer: Moving to and working with CBO - Part 5 - Page 2November 24, 2003 Generating HistogramsThese statistics could be generated using the following commands: DBMS_STATS.GATHER_TABLE_STATS with METHOD_OPT => 'FOR COLUMNS <colname> SIZE xxx' or ANALYZE command for column statistics with SIZE option. The SIZE option in the above command specifies the number of buckets to be stored e.g.: Gather histogram statistics for AM20.ENAME. Since there are 17 distinct values in the column, I can choose to have about 1 to 17 buckets in the histogram. As the frequency of each value is less, an approximate of 5 buckets is chosen initially. SQL> exec dbms_stats.gather_table_stats(ownname => 'SYS', tabname => 'AM20',- > method_opt => 'FOR COLUMNS ENAME SIZE 5'); PL/SQL procedure successfully completed. SQL> SELECT table_name, column_name,num_distinct,num_buckets,last_analyzed, 2 avg_col_len 3 from dba_tab_col_statistics 4 where table_name = 'AM20'; TABLE_NAME COLUMN_NAM NUM_DISTINCT NUM_BUCKETS LAST_ANAL AVG_COL_LEN ---------- ---------- ------------ ----------- --------- ----------- AM20 ENAME 17 5 17-OCT-03 6 SQL> select owner, table_name, column_name, 2 to_char(endpoint_number, '99999.99') EP_NO, 3 trunc(endpoint_value) EP_VAL, 4 endpoint_actual_value EP_A_VAL 5 from dba_tab_histograms 6 where table_name = 'AM20' 7 and column_name = 'ENAME' 8 order by endpoint_number; OWNER TABLE_NAME COLUMN_NAM EP_NO EP_VAL EP_A_VAL ---------- ---------- ---------- --------- ---------- ----------- SYS AM20 ENAME .00 3.3888E+35 ADAMS SYS AM20 ENAME 1.00 3.6507E+35 FORD SYS AM20 ENAME 2.00 4.0632E+35 NARAYAN SYS AM20 ENAME 3.00 4.3269E+35 SUNIL SYS AM20 ENAME 4.00 4.3788E+35 TURNER SYS AM20 ENAME 5.00 4.5305E+35 WARD 6 rows selected. Again, if you are maintaining histograms for a frequently changing column, it is recommended to update the histograms more frequently to provide accurate statistics to the optimizer. Histogram on indexed columnsOracle provides a method of analyzing indexed columns that need to be provided with histogram statistics. This can be done by using the INDEX_HISTOGRAM view. e.g.: Column ename is provided with an index in table AM20. This will be used to analyze the density of the values. As can be seen below, the INDEX_HISTOGRAM view shows the occurrences of the distinct values. This information can be analyzed to find out if the data is skewed.
SQL> create index am20_ind1 on am20(ename);
Index created.
SQL> analyze index am20_ind1 validate structure;
Index analyzed.
SQL> select * from index_histogram;
REPEAT_COUNT KEYS_WITH_REPEAT_COUNT
-------------------- ----------------------
0 0
1 12
2 0
3 0
4 1
5 1
6 0
7 0
8 1
9 2
10 0
11 0
12 0
13 0
14 0
15 0
16 rows selected.
Histogram useThe following is a simple example of how the histogram is actually used by the optimizer to pick up an index based on the selectivity of the column present in the predicate. e.g.: There are 600 records in the table AM20. ENAME data 'SUNIL' has 512 records and 'TURNER' has 9 records. Without the histogram statistics, the index is used to retrieve values for both the queries as shown below. SQL> select * from am20 where ename = 'TURNER'; ... 9 rows selected. Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=1 Card=1 Bytes=11) 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'AM20' (Cost=1 Card=6 Bytes=66) 3 2 INDEX (RANGE SCAN) OF 'AM20_N1' (NON-UNIQUE) (Cost=1 Card=6) SQL> select * from am20 where ename = 'SUNIL'; ... 512 rows selected. Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=1 Card=1 Bytes=11) 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'AM20' (Cost=1 Card=6 Bytes=66) 3 2 INDEX (RANGE SCAN) OF 'AM20_N1' (NON-UNIQUE) (Cost=1 Card=6) The use of index in retrieving 9 rows is fast, but to retrieve 512 records (85% of the table) with an index is overkill. The optimizer is not equipped with the data distribution statistics to make this decision. Histogram data is provided and the query is re-run as shown below, with the right execution plan this time. SQL> exec dbms_stats.gather_table_stats(ownname => 'SYS', tabname => 'AM20',- > method_opt => 'FOR COLUMNS ENAME SIZE 75'); PL/SQL procedure successfully completed. SQL> select * from am20 where ename = 'TURNER'; ... 512 rows selected. Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=1 Card=1 Bytes=10) 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (FULL) OF 'AM20' (Cost=1 Card=9 Bytes=90) |