Oracle Optimizer: Moving to and working with CBO - Part 5
November 24, 2003
We covered the various methods of generating statistics in the last installment; moving forward, let's look at the data skewness problem and the use of the Monitoring option for updating statistics on the fly.
Histogram tells the optimizer how the data is distributed for a column. This information is used in determining the selectivity of the column for a given query and arriving at an optimal execution plan.
Column statistics in the form of histograms are appropriate for columns whose data distribution deviates from the expected uniform distribution. For uniformly distributed data, the optimizer can do proper costing for executing a particular statement. When data is not uniformly distributed, also know as highly skewed data distribution, the optimizer may not be in a position to accurately estimate the selectivity of a query. This option provides statistics at a very low level and its use would be rare, though it could prove very beneficial in certain scenarios.
Columns not eligible for histogram
Please note that histograms should not be used when any of the following is true.
Columns eligible for histogram
If the above rules are not satisfying for a column, it could be considered for distribution statistics. For example, if we again consider a column that holds around 100 distinct statuses in a 100000 records table, 80% of the values lie between 15-30 and the remaining 20% is distributed in the other ranges. If we design a data distribution graph (histogram chart) on these figures, the height of few values or ranges will be very high where as others will be very low. This shows an oblique line for the data distribution.
A histogram is good for number columns. For character columns only the first 32 bytes (as of 18.104.22.168) of the string are used for building the histograms, this may sometimes result in incorrect information being created if the size of the column data exceeds this limitation.
User-specific histogram values can also be stored in the dictionary using the DBMS_STATS.PREPARE_COLUMN_VALUES and DBMS_STATS.SET_COLUMN_VALUES routines.
Histogram information is stored in the following dictionary tables.
Histogram values for columns in tables:
For partition table histograms values:
For evaluating histograms on indexed columns:
Other Views that give similar data:
Columns in the above tables are self-explanatory.
Buckets in Histograms
Histogram statistics are stored in the form of buckets. Buckets represent the partitioning of data values, depending on the range. By default, 75 buckets are created. A maximum of 254 buckets can be specified for a column. How many buckets are required for a column will depend on the occurrences of distinct values. The default number of buckets is appropriate, but you will have to experiment with various bucket sizes to find the most suitable size.
If the number of distinct column values is less than the number of buckets specified, the individual column values and the count of these values is directly stored as histogram statistics. If the number of distinct column values is more than the buckets specified, Oracle uses an algorithm to store these values in ranges. If a series of continuous ranges have the same value, they may not be shown in the histogram table to save on space.
You may find columns with one-bucket histograms, these are as good as no histogram statistics and the optimizer ignores them.