3.2 Cost based optimizer (CBO)
CBO
follows Expense calculation methodology. All execution plans are tagged with a
cost, the one with the lowest cost will be chosen. The higher the cost the more
resources will be used by the execution plan, the lower the cost, the more
efficient the query is.
CBO
uses all available information-statistics and histograms stored in the dictionary, user provided hints and supplied parameter settings
to arrive at the cost. CBO generates all possible permutations of access
methods and then chooses what fits best. The number of permutations depends on
the number of tables present in the query and can sometimes be around 80,000
permutations or even more! Please refer to the parameter
section in part 2 of this series for setting related parameters.
CBO
may also perform operations such as query transformation, view merging, OR
transformation, push join predicates, etc. that would change the original
statement and alter existing or add new predicates, all with the aim of
deriving new access plans that could be better than the existing ones. Note
that transformation does not affect the data that is returned, only the
execution path. Please refer to the parameter
section in part 2 of this series for information related to this.
3.2.1 Statistics
Statistics
provide critical input in order for CBO to work properly; these are generated
for data storing objects and include information such as the number of rows in
a table, distinct values in a column, number of leaf blocks in an index, etc.
The more accurate the statistics, the more efficient the results provided by
Optimizer. Please refer to the Generating
statistics section in part 3 of this series for how this information is generated and how
best we can maintain it.
Statistics
may be exact or estimated. Statistics generated with a COMPUTE clause analyzes
all of the data in the object. This gives the optimizer accurate information to
work on and arrive at a good execution plan.
Statistics
generated with an ESTIMATE clause analyzes data in the object to the extent of
sample size mentioned. Sample size may be specified as number of rows or
percentage of rows that should be randomly analyzed to generate the statistics.
Optionally block sampling may also be specified. This saves on time if there
are many huge tables in the system. The guarantee of good execution plans will
depend on how close the estimated value is to the exact values. You can try out
your setup at different sample sizes to arrive at an appropriate figure or have
different estimation levels for different types of tables, but the idea is to
get as close to accuracy as feasible.
Statistics
are stored in a data dictionary in tables owned by SYS user. The following
views display the statistics collected for tables, columns and indexes.
For Tables
- DBA_TABLES
- NUM_ROWS - Number of rows.
- BLOCKS - Number of used blocks.
- EMPTY_BLOCKS - Number of empty blocks that have never been used.
- AVG_SPACE - Average free space (in bytes) in blocks allocated to the table.
All empty and free blocks are considered for this.
- CHAIN_CNT - Number of chained or migrated rows.
- AVG_ROW_LEN - Average row length in bytes.
- LAST_ANALYZED - Date when the table was last analyzed.
- SAMPLE_SIZE - Sample size provided for ESTIMATE statistics. Equal to NUM_ROWS if COMPUTE.
- GLOBAL_STATS - For partitioned tables, YES - statistics collected as a whole,
NO - statistics are estimated from statistics.
- USER_STATS - Set to YES if user has explicitly set the statistics for the table.
Statistics
for individual partitions of a table can be seen from DBA_TAB_PARTITIONS.
Cluster statistics is available from DBA_CLUSTERS. Object table statistics are present in DBA_OBJECT_TABLES.
For Columns
- DBA_TAB_COLUMNS
- NUM_DISTINCT - Number of distinct values.
- LOW_VALUE - Lowest value
- HIGH_VALUE - Highest value
- DENSITY - Density of the column
- NUM_NULLS - Number of records with null value for the concerned column.
- NUM_BUCKETS - Number of buckets in histograms. Refer Histograms section.
- SAMPLE_SIZE - Sample size provided for ESTIMATE statistics. Equal to total rows if COMPUTE.
- LAST_ANALYZED - Date when the table was last analyzed.
DBA_TAB_COL_STATISTICS shows similar data. Partitioned Table column statistics can be viewed from DBA_PART_COL_STATISTICS and DBA_SUBPART_COL_STATISTICS.
For Indexes
- DBA_INDEXES
- BLEVEL - Depth of the index, from root to leaf.
- LEAF_BLOCKS - Number of leaf blocks.
- DISTINCT KEYS - Number of distinct keys.
- AVG_LEAF_BLOCKS_PER_KEY - Average number of leaf blocks in which each distinct key appears, should be 1 for unique indexes.
- AVG_DATA_BLOCKS_PER_KEY - Average number of blocks in the table that are pointed to
by a distinct key.
- CLUSTERING_FACTOR - A count that determines the ordering of the index.
Index is ordered if count is closer to the number of blocks, i.e.,
entries in single leaf tend to point to rows in same blocks in the table.
Index is randomly ordered if closer to the number of rows, i.e.,
entries in single leaf are pointing to rows spread across multiple blocks.
- NUM_ROWS - Number of rows indexed.
- SAMPLE_SIZE - Sample size provided for ESTIMATE statistics. Equal to NUM_ROWS if COMPUTE..
- LAST_ANALYZED - Date when the table was last analyzed.
- GLOBAL_STATS - For partitioned indexes, YES - statistics collected as a whole,
NO - statistics are estimated from statistics.
- USER_STATS - Set to YES if user has explicitly set the statistics for the index.
- PCT_DIRECT_ACCESS - For secondary indexes on IOTs, percentage of rows with valid guess.
Statistics
for individual partitions of indexes can be seen from DBA_IND_PARTITIONS and
DBA_IND_SUBPARTITIONS.
Dictionary
tables related to Histogram information are discussed later.