Hybrid Histograms in Oracle 18

iOracle version 12 introduced the hybrid histogram as a performance improvement — such histograms are based on ‘rules’ set forth in version 12.1. Those rules are:

  1. A value should not be found in more than one bucket.
  2. The bucket size is allowed to be extended in order to contain all instances of the same distinct value.
  3. Adjusted bucket size cannot be less than the original size (not applicable at either end of the data set)
  4. The original number of buckets should not be reduced.

This was a major improvement over previous histograms (height-balanced and frequency) in terms of how a query plan is generated. There was a bug with this type of histogram and cardinality estimates (Bug 25994960), and a patch was supplied as a corrective measure. This was not widely publicized so many 12.2 databases remain unpatched; a backport patch for 12.1 also exists, again not widely publicized.

With Oracle 18 (specifically 18.3) this issue has been addressed which leads to the following situation between unpatched 12.2 and 12.1 databases and the 18.3 version. Jonathan Lewis has generously provided a script to illustrate the problem. I’ve used a modified version of his script in this article.

The dbms_stats.gather_table_stats procedure is used to generate the hybrid histogram by specifying it is to create 13 ‘buckets’. It’s not the number of buckets that is the issue, it’s the endpoints of those buckets that change between the versions. The script used is posted below:


drop table hist_test purge;
 
execute dbms_random.seed(0)
 
create table hist_test(
        my_id           number(8,0),
        id_mod_20       number(6,0),
        id_mod_30       number(6,0),
        id_mod_50       number(6,0),
        my_rand_id      number(6,0)
)
;
 
insert into hist_test
with datasource as (
        select
                rownum my_id
        from dual
        connect by
                level <= 1e4 
)
select
        rownum                                  my_id,
        mod(rownum,   20) + 1                   id_mod_20,
        mod(rownum,   30) + 1                   id_mod_30,
        mod(rownum,   50) + 1                   id_mod_50,
        28 - round(abs(7*dbms_random.normal))        my_rand_id
from
        datasource       ds1
where
        rownum <= 800
;
 
commit;
 
begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'HIST_TEST',
                method_opt       => 'for all columns size 1 for columns my_rand_id size 13'
        );
end;
/

The script above generates 22 unique values to base the histogram on. The query below reports on those values and on the histogram information Oracle has generated:


select
        my_rand_id, count(*)
from
        hist_test
group by
        my_rand_id
order by
        my_rand_id
;
 
select
        endpoint_value                                                    value,
        endpoint_number,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) bucket_size,
        endpoint_repeat_count
from
        user_tab_histograms
where
        table_name  = 'HIST_TEST'
and     column_name = 'MY_RAND_ID'
order by
        endpoint_value
;

Since the data sets are the same between versions the data set will be reported only once:


MY_RAND_ID   COUNT(*)
---------- ----------
         1          1
         8          3
         9          1
        10          5
        11          4
        12          8
        13         14
        14          9
        15         11
        16         22
        17         34
        18         31
        19         36
        20         57
        21         44
        22         45
        23         72
        24         70
        25         87
        26        109
        27         96
        28         41
 
22 rows selected.
 

As mentioned at the beginning of the article, it’s the histograms that change. Look at the histogram in an unpatched 12.2 database:


     VALUE ENDPOINT_NUMBER BUCKET_SIZE ENDPOINT_REPEAT_COUNT
---------- --------------- ----------- ---------------------
         1               1           1                     1
        15              56          55                    11
        17             112          56                    34
        18             143          31                    31
        19             179          36                    36
        20             236          57                    57
        21             280          44                    44
        22             325          45                    45
        23             397          72                    72
        24             467          70                    70
        25             554          87                    87
        26             663         109                   109
        28             800         137                    41
 
13 rows selected.
 

Oracle generates the histogram based on the estimated cardinalities and notice that, in 12.2, Oracle takes every value from 17 through 26 for the histogram. Moving to Oracle 18.3, with the same script the histogram data is slightly different:


     VALUE ENDPOINT_NUMBER BUCKET_SIZE ENDPOINT_REPEAT_COUNT
---------- --------------- ----------- ---------------------
         1               1           1                     1
        15              56          55                    11
        17             112          56                    34
        19             179          67                    36
        20             236          57                    57
        21             280          44                    44
        22             325          45                    45
        23             397          72                    72
        24             467          70                    70
        25             554          87                    87
        26             663         109                   109
        27             759          96                    96
        28             800          41                    41
 
13 rows selected.

Because of the patch, the value 18 is now missing from the histogram, replaced with the value 27, a result of correcting the cardinality estimates generated to create the histogram. Over the years the optimizer has had its share of mishaps with cardinality estimates, and this one appears to be minor in nature. It can be confusing, though, to upgrade to 18.3 and find that a hybrid histogram has changed unexpectedly.

It may be a rare occurrence to actually check histogram data without a performance issue at hand so this could easily be overlooked in a database upgrade. It is nice to be aware that such changes can occur. If someone complains that a query ran ‘fine’ in 12.2, but is now running a bit… ‘off’ in 18.3, it can be explained.

David Fitzjarrell
David Fitzjarrell
David Fitzjarrell has more than 20 years of administration experience with various releases of the Oracle DBMS. He has installed the Oracle software on many platforms, including UNIX, Windows and Linux, and monitored and tuned performance in those environments. He is knowledgeable in the traditional tools for performance tuning – the Oracle Wait Interface, Statspack, event 10046 and 10053 traces, tkprof, explain plan and autotrace – and has used these to great advantage at the U.S. Postal Service, American Airlines/SABRE, ConocoPhilips and SiriusXM Radio, among others, to increase throughput and improve the quality of the production system. He has also set up scripts to regularly monitor available space and set thresholds to notify DBAs of impending space shortages before they affect the production environment. These scripts generate data which can also used to trend database growth over time, aiding in capacity planning. He has used RMAN, Streams, RAC and Data Guard in Oracle installations to ensure full recoverability and failover capabilities as well as high availability, and has configured a 'cascading' set of DR databases using the primary DR databases as the source, managing the archivelog transfers manually and montoring, through scripts, the health of these secondary DR databases. He has also used ASM, ASMM and ASSM to improve performance and manage storage and shared memory.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles