dcsimg
Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Tips Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
SQL Scripts & Samples
Tips
» Database Forum
» Slideshows
» Sitemap
Free Newsletters:
DatabaseDaily  

By submitting your information, you agree that databasejournal.com may send you databasejournal offers via email, phone and text message, as well as email offers about other products and services that databasejournal believes may be of interest to you. databasejournal will process your information in accordance with the Quinstreet Privacy Policy.

News Via RSS Feed


Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

Oracle

Posted Nov 24, 2003

Oracle Optimizer: Moving to and working with CBO - Part 5 - Page 2

By Amar Kumar Padhi

Generating Histograms

These statistics could be generated using the following commands:

DBMS_STATS.GATHER_TABLE_STATS with METHOD_OPT => 'FOR COLUMNS <colname> SIZE xxx'

or

ANALYZE command for column statistics with SIZE option.

The SIZE option in the above command specifies the number of buckets to be stored

e.g.: Gather histogram statistics for AM20.ENAME. Since there are 17 distinct values in the column, I can choose to have about 1 to 17 buckets in the histogram. As the frequency of each value is less, an approximate of 5 buckets is chosen initially.

SQL> exec dbms_stats.gather_table_stats(ownname => 'SYS', tabname => 'AM20',-
> method_opt => 'FOR COLUMNS ENAME SIZE 5');

PL/SQL procedure successfully completed.


SQL> SELECT table_name, column_name,num_distinct,num_buckets,last_analyzed,
  2         avg_col_len
  3  from   dba_tab_col_statistics
  4  where  table_name = 'AM20';

TABLE_NAME COLUMN_NAM NUM_DISTINCT NUM_BUCKETS LAST_ANAL AVG_COL_LEN
---------- ---------- ------------ ----------- --------- -----------
AM20       ENAME                17           5 17-OCT-03           6


SQL> select owner, table_name, column_name,
  2         to_char(endpoint_number, '99999.99') EP_NO,
  3         trunc(endpoint_value) EP_VAL,
  4         endpoint_actual_value EP_A_VAL
  5  from   dba_tab_histograms
  6  where  table_name = 'AM20'
  7  and    column_name = 'ENAME'
  8  order by endpoint_number;

OWNER      TABLE_NAME COLUMN_NAM EP_NO         EP_VAL EP_A_VAL
---------- ---------- ---------- --------- ---------- -----------
SYS        AM20       ENAME            .00 3.3888E+35 ADAMS
SYS        AM20       ENAME           1.00 3.6507E+35 FORD
SYS        AM20       ENAME           2.00 4.0632E+35 NARAYAN
SYS        AM20       ENAME           3.00 4.3269E+35 SUNIL
SYS        AM20       ENAME           4.00 4.3788E+35 TURNER
SYS        AM20       ENAME           5.00 4.5305E+35 WARD

6 rows selected.

Again, if you are maintaining histograms for a frequently changing column, it is recommended to update the histograms more frequently to provide accurate statistics to the optimizer.

Histogram on indexed columns

Oracle provides a method of analyzing indexed columns that need to be provided with histogram statistics. This can be done by using the INDEX_HISTOGRAM view.

e.g.: Column ename is provided with an index in table AM20. This will be used to analyze the density of the values. As can be seen below, the INDEX_HISTOGRAM view shows the occurrences of the distinct values. This information can be analyzed to find out if the data is skewed.

SQL> create index am20_ind1 on am20(ename);

Index created.

SQL> analyze index am20_ind1 validate structure;

Index analyzed.

SQL> select * from index_histogram;

        REPEAT_COUNT KEYS_WITH_REPEAT_COUNT
-------------------- ----------------------
                   0                      0
                   1                     12
                   2                      0
                   3                      0
                   4                      1
                   5                      1
                   6                      0
                   7                      0
                   8                      1
                   9                      2
                  10                      0
                  11                      0
                  12                      0
                  13                      0
                  14                      0
                  15                      0

16 rows selected.

Histogram use

The following is a simple example of how the histogram is actually used by the optimizer to pick up an index based on the selectivity of the column present in the predicate.

e.g.: There are 600 records in the table AM20. ENAME data 'SUNIL' has 512 records and 'TURNER' has 9 records. Without the histogram statistics, the index is used to retrieve values for both the queries as shown below.

SQL> select * from am20 where ename = 'TURNER';
...
9 rows selected.

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=1 Card=1 Bytes=11)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'AM20' (Cost=1 Card=6 Bytes=66)
   3    2       INDEX (RANGE SCAN) OF 'AM20_N1' (NON-UNIQUE) (Cost=1 Card=6)


SQL> select * from am20 where ename = 'SUNIL';
...
512 rows selected.

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=1 Card=1 Bytes=11)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'AM20' (Cost=1 Card=6 Bytes=66)
   3    2       INDEX (RANGE SCAN) OF 'AM20_N1' (NON-UNIQUE) (Cost=1 Card=6)

The use of index in retrieving 9 rows is fast, but to retrieve 512 records (85% of the table) with an index is overkill. The optimizer is not equipped with the data distribution statistics to make this decision. Histogram data is provided and the query is re-run as shown below, with the right execution plan this time.

SQL> exec dbms_stats.gather_table_stats(ownname => 'SYS', tabname => 'AM20',-
> method_opt => 'FOR COLUMNS ENAME SIZE 75');

PL/SQL procedure successfully completed.


SQL> select * from am20 where ename = 'TURNER';
...
512 rows selected.

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=1 Card=1 Bytes=10)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'AM20' (Cost=1 Card=9 Bytes=90)



Oracle Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM