iOracle version 12 introduced the hybrid histogram as a performance improvement — such histograms are based on ‘rules’ set forth in version 12.1. Those rules are:
- A value should not be found in more than one bucket.
- The bucket size is allowed to be extended in order to contain all instances of the same distinct value.
- Adjusted bucket size cannot be less than the original size (not applicable at either end of the data set)
- The original number of buckets should not be reduced.
This was a major improvement over previous histograms (height-balanced and frequency) in terms of how a query plan is generated. There was a bug with this type of histogram and cardinality estimates (Bug 25994960), and a patch was supplied as a corrective measure. This was not widely publicized so many 12.2 databases remain unpatched; a backport patch for 12.1 also exists, again not widely publicized.
With Oracle 18 (specifically 18.3) this issue has been addressed which leads to the following situation between unpatched 12.2 and 12.1 databases and the 18.3 version. Jonathan Lewis has generously provided a script to illustrate the problem. I’ve used a modified version of his script in this article.
The dbms_stats.gather_table_stats procedure is used to generate the hybrid histogram by specifying it is to create 13 ‘buckets’. It’s not the number of buckets that is the issue, it’s the endpoints of those buckets that change between the versions. The script used is posted below:
drop table hist_test purge;
execute dbms_random.seed(0)
create table hist_test(
my_id number(8,0),
id_mod_20 number(6,0),
id_mod_30 number(6,0),
id_mod_50 number(6,0),
my_rand_id number(6,0)
)
;
insert into hist_test
with datasource as (
select
rownum my_id
from dual
connect by
level <= 1e4
)
select
rownum my_id,
mod(rownum, 20) + 1 id_mod_20,
mod(rownum, 30) + 1 id_mod_30,
mod(rownum, 50) + 1 id_mod_50,
28 - round(abs(7*dbms_random.normal)) my_rand_id
from
datasource ds1
where
rownum <= 800
;
commit;
begin
dbms_stats.gather_table_stats(
ownname => null,
tabname => 'HIST_TEST',
method_opt => 'for all columns size 1 for columns my_rand_id size 13'
);
end;
/
The script above generates 22 unique values to base the histogram on. The query below reports on those values and on the histogram information Oracle has generated:
select
my_rand_id, count(*)
from
hist_test
group by
my_rand_id
order by
my_rand_id
;
select
endpoint_value value,
endpoint_number,
endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) bucket_size,
endpoint_repeat_count
from
user_tab_histograms
where
table_name = 'HIST_TEST'
and column_name = 'MY_RAND_ID'
order by
endpoint_value
;
Since the data sets are the same between versions the data set will be reported only once:
MY_RAND_ID COUNT(*)
---------- ----------
1 1
8 3
9 1
10 5
11 4
12 8
13 14
14 9
15 11
16 22
17 34
18 31
19 36
20 57
21 44
22 45
23 72
24 70
25 87
26 109
27 96
28 41
22 rows selected.
As mentioned at the beginning of the article, it’s the histograms that change. Look at the histogram in an unpatched 12.2 database:
VALUE ENDPOINT_NUMBER BUCKET_SIZE ENDPOINT_REPEAT_COUNT
---------- --------------- ----------- ---------------------
1 1 1 1
15 56 55 11
17 112 56 34
18 143 31 31
19 179 36 36
20 236 57 57
21 280 44 44
22 325 45 45
23 397 72 72
24 467 70 70
25 554 87 87
26 663 109 109
28 800 137 41
13 rows selected.
Oracle generates the histogram based on the estimated cardinalities and notice that, in 12.2, Oracle takes every value from 17 through 26 for the histogram. Moving to Oracle 18.3, with the same script the histogram data is slightly different:
VALUE ENDPOINT_NUMBER BUCKET_SIZE ENDPOINT_REPEAT_COUNT
---------- --------------- ----------- ---------------------
1 1 1 1
15 56 55 11
17 112 56 34
19 179 67 36
20 236 57 57
21 280 44 44
22 325 45 45
23 397 72 72
24 467 70 70
25 554 87 87
26 663 109 109
27 759 96 96
28 800 41 41
13 rows selected.
Because of the patch, the value 18 is now missing from the histogram, replaced with the value 27, a result of correcting the cardinality estimates generated to create the histogram. Over the years the optimizer has had its share of mishaps with cardinality estimates, and this one appears to be minor in nature. It can be confusing, though, to upgrade to 18.3 and find that a hybrid histogram has changed unexpectedly.
It may be a rare occurrence to actually check histogram data without a performance issue at hand so this could easily be overlooked in a database upgrade. It is nice to be aware that such changes can occur. If someone complains that a query ran ‘fine’ in 12.2, but is now running a bit… ‘off’ in 18.3, it can be explained.