Oracle databases run better with a good set of statistics, and by good I mean representative of the data distribution or skew. Oracle did its best in earlier releases of the database to ensure statistics were accurate but changes in the optimizer behavior and in the data we store sometimes made that difficult, if not nearly impossible to implement. Oracle 11.2 (11gR2, if you like the marketing speak) has finally addressed two major issues of statistics gathering -- speed and accuracy -- by changing how DBMS_STATS samples the data.

Looking back to the previous major release (10.2.0) the methodology for statistics gathering relied basically on the sample size; the larger the sample size the better the statistics at the cost of increased run time to gather those statistics. Oracle also introduced the auto sample size default, which determined the sample size to use for each table. Using the default settings did produce statistics in a shorter length of time, but did so at the expense of inaccurate histograms, plainly wrong histograms, and inaccurate density values. Partitioned tables were the most often affected when sample sizes smaller than 90% were used; the auto sample size Oracle computed often hit the 30% mark leaving histogram values woefully in need of adjustment. Even though Oracle would adjust the sampled values up at the table level (to more accurately reflect the actual number of rows) the histogram values remained at the actual sampled size throwing density/selectivity calculations off by a considerable margin and adversely affecting performance based on cost calculations.

Another histogram issue in 10.2.0 involves partitioned tables and single-valued non-null columns. Oracle will occasionally create frequency histograms on these columns, again affecting calculations by the CBO resulting in poor execution plans that, in many cases, use the 'wrong' index. Since the CBO bases decisions on the number of distinct values in a column (the NDV value in a 10053 trace) a frequency histogram implies there will be more than one distinct value; having such a histogram on a single-valued, non-null column will decrease the NDV from 1 to a considerably smaller number thus throwing another factor into the selectivity calculations. Oracle now considers the non-existent distribution of values in a single-valued column and selects an index less suited to satisfying the query than it would have were the histogram non-existent. Again, this is magnified by the partial statistics values in the histograms and results in sub-par performance.

Oracle 11.2.0 has corrected these issues with a better methodology that not only increases the speed of statistics collection but also provides greater accuracy of the results. An improved sampling algorithm along with a reduction of the full table scans executed has resulted in more accurate statistics in a shorter amount of time, and better statistics definitely mean better execution plans.

Maria Colgan, of the Oracle Optimizer Development Team, has reported on these improvements; one table I want to share is shown below, and compares the statistics using a 1% sample in Oracle 10.2.0 and the statistics from 11.2.0 using the improved auto sample size and a table from the TPC benchmark suite:

Column Name Actual NDV Auto Sampling in Oracle 11g 1% Sampling
orderkey 450,000,000 98.0% 50%
comment 181,122,127 98.60% 4.60%
partkey 60,000,000 99.20% 98.20%
suppkey 3,000,000 99.60% 99.90%
extendedprice 3,791,320 99.60% 94.30%

Notice that for most of the reported columns 11.2.0 statistics are superior to those generated by 10.2.0 at a 1% sample size; the 1% sample was selected as it was the fastest running between 1%, 100% and auto sample size in 10.2.0. Notice also that the auto sample size in 11.2.0 generates statistics percentages very close to 100, indicating that the new auto sampling algorithm will run faster yet produce results rivaling a 100% sample size and do that in approximately 1/10th of the time.

That isn't the only improvement in 11.2.0 statistics as the histograms generated have accurate numbers, better reflecting the state of the table data distribution. Partitioned tables no longer cause Oracle to generate spurious histograms such as the frequency histograms on single-valued, non-null columns making CBO calculations more accurate. Better statistics mean better plans and less of a headache for the users and the DBAs.

Statistics in 11.2.0 are better, out of the box, so the default statistics job that Oracle provides will generate reasonable values that should improve most installations. There may be a few configurations where the default settings will not provide improvement; this is where the DBA needs to know his or her data and how best to compute the statistics. The watchword for upgrades to 11.2.0 is test; test a number of scenarios to ensure the statistics gathering methodology chosen is the correct one. Knowing the data is the key to successful statistics; simply because one company sees improvement with the out-of-the-box Oracle solution doesn't mean yours will.

Statistics methodologies in 11.2.0 have improved dramatically from those in 10.2.0 and even 11.1.0, making the jobs that gather statistics faster and more reliable. If you're not doing anything unusual to gather statistics in your 10.2.0 database then after you migrate to 11.2.0 you should see improvement in plans and reduced time collecting statistics. The less time a DBA spends on statistics is more time to address other, more pressing, issues. That's a good thing no matter how you look at it.

**See all articles by David Fitzjarrell**