Extended Statistics and Oracle 12c

Extended statistics can be very helpful to the optimizer by establishing a dependency between two or more columns in a table. Such dependencies can improve cardinality estimates the optimizer uses to determine the best plan of attack for a query. Extended statistics have been available since 11.2.0.x, and now Oracle 12.1.0.2 improves on that by generating extended statistics automatically. Using a feature called Automatic Column Group detection, Oracle can recognize column dependencies and create extended statistics on those recognized groups. Let’s look at how this can be done and what benefit it can provide.

Oracle provides a procedure in the DBMS_STATS package that tells the database to monitor column usage, including recognizing any column groups in use. The procedure is simple to run, requiring a SQL set name, an owner and a time limit, any or all of which can be NULL. If the sql set and owner are NULL then Oracle monitors all column usage. In this example a 5-minute time limit is set:


exec dbms_stats.seed_col_usage(null,null,300)

Let’s create a table with dependent columns; city/state/country set of columns so that state is dependent on country and city is dependent on state:


SQL> 
SQL> --
SQL> -- Create tables and sequence
SQL> --
SQL> create sequence cust_seq start with 1 increment by 1 nocycle nomaxvalue nocache;

Sequence created.

SQL> 
SQL> create table cust (
  2  	cust_id number not null,
  3  	cust_nm varchar2(80),
  4  	cust_city	varchar2(50),
  5  	cust_state	varchar2(2),
  6  	cust_country	varchar2(2));

Table created.

SQL> 
SQL> create table country(
  2  	country_nm	varchar2(80),
  3  	country_cd2	varchar2(2),
  4  	country_cd3	varchar2(3));

Table created.

SQL> 

Load data into the tables; for the COUNTRY table SQL*Loader is used, and for the CUST table a series of 29000+ inserts are executed (the output of the inserts is not shown for space reasons). Additionally primary and foreign key constraints were created to ensure referential integrity:


SQL> 
SQL> --
SQL> -- Load country data
SQL> --
SQL> $sqlldr bing/#######@*********** country.ctl

SQL> 
SQL> --
SQL> -- Create primary key and foreign key constraints
SQL> --
SQL> -- Ensures no invalid country codes are entered
SQL> --
SQL> alter table country add constraint country_pk primary key(country_cd2);

Table altered.

SQL> 
SQL> alter table cust add constraint cust_country_fk foreign key(cust_country) references country(country_cd2);

Table altered.

SQL> 
SQL> --
SQL> -- Load the customer data
SQL> --
SQL> @ins_cust_data

...

SQL> 

An index was created on the CUST table to speed access to the data:


SQL> 
SQL> --
SQL> -- Index CUST table for easier access
SQL> --
SQL> create index cust_idx on cust(cust_state, cust_country);

Index created.

SQL>

Statistics are gathered on the CUST table:


SQL> 
SQL> --
SQL> -- Gather initial statistics
SQL> --
SQL> exec dbms_stats.gather_table_stats(user, 'CUST')

PL/SQL procedure successfully completed.

SQL>

Two queries are run that relate the cust_state column with the cust_country column. Plan statistics are gathered on the second run and the plan is displayed:


SQL> 
SQL> --
SQL> -- Execute two queries to see how the optimizer
SQL> -- fares with the existing statistics
SQL> --
SQL> select count(*)
  2  from cust
  3  where cust_state = 'MT' and cust_country = 'US'
  4  /

  COUNT(*)                                                                                                                                            
----------                                                                                                                                            
       315                                                                                                                                            

SQL> 
SQL> select /*+ gather_plan_statistics */ count(*)
  2  from cust
  3  where cust_state = 'MT' and cust_country = 'US'
  4  /

  COUNT(*)                                                                                                                                            
----------                                                                                                                                            
       315                                                                                                                                            

SQL> 
SQL> --
SQL> -- Generate the plan to see how far off the estimate is
SQL> -- from the actual number of rows
SQL> --
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT                                                                                                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  5027bkwuzg1jz, child number 0                                                                                                                 
-------------------------------------                                                                                                                 
select /*+ gather_plan_statistics */ count(*) from cust where                                                                                         
cust_state = 'MT' and cust_country = 'US'                                                                                                             
                                                                                                                                                      
Plan hash value: 2062844691                                                                                                                           
                                                                                                                                                      
----------------------------------------------------------------------------------------                                                              
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                                              
----------------------------------------------------------------------------------------                                                              
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |       3 |                                                              
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |       3 |                                                              
|*  2 |   INDEX RANGE SCAN| CUST_IDX |      1 |    578 |    315 |00:00:00.01 |       3 |                                                              
----------------------------------------------------------------------------------------                                                              
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   2 - access("CUST_STATE"='MT' AND "CUST_COUNTRY"='US')                                                                                              
                                                                                                                                                      

20 rows selected.

SQL> 

With the current statistics the optimizer isn’t aware of the column grouping so the cardinality estimate is off by almost a factor of 2. Let’s ‘fix’ that by enabling column usage tracking across the database:


SQL> 
SQL> --
SQL> -- Enable the automatic extended stats gathering by
SQL> -- recording column usage
SQL> --
SQL> exec dbms_stats.seed_col_usage(null,null,300)

PL/SQL procedure successfully completed.

SQL> 

Run the same set of queries again:


SQL> 
SQL> --
SQL> -- Run the same queries again
SQL> --
SQL> select count(*)
  2  from cust
  3  where cust_state = 'MT' and cust_country = 'US'
  4  /

  COUNT(*)                                                                                                                                            
----------                                                                                                                                            
       315                                                                                                                                            

SQL> 
SQL> select /*+ gather_plan_statistics */ count(*)
  2  from cust
  3  where cust_state = 'MT' and cust_country = 'US'
  4  /

  COUNT(*)                                                                                                                                            
----------                                                                                                                                            
       315                                                                                                                                            

SQL> 

Gather statistics again on the CUST table, and this time extended statistics will be generated:


SQL> 
SQL> --
SQL> -- Generate current statistics
SQL> --
SQL> -- These will include extended statistics
SQL> -- for the recognized column groups
SQL> --
SQL> exec dbms_stats.gather_table_stats(user, 'CUST')

PL/SQL procedure successfully completed.

SQL> 

Run the same set of queries one more time, and report the current plan:


SQL> 
SQL> --
SQL> -- Run the queries one more time
SQL> --
SQL> select count(*)
  2  from cust
  3  where cust_state = 'MT' and cust_country = 'US'
  4  /

  COUNT(*)                                                                                                                                            
----------                                                                                                                                            
       315                                                                                                                                            

SQL> 
SQL> select /*+ gather_plan_statistics */ count(*)
  2  from cust
  3  where cust_state = 'MT' and cust_country = 'US'
  4  /

  COUNT(*)                                                                                                                                            
----------                                                                                                                                            
       315                                                                                                                                            

SQL> 
SQL> --
SQL> -- Current plan now shows the row estimate matches
SQL> -- the actual count, courtesy of automatic generation
SQL> -- of extended statistics
SQL> --
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT                                                                                                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  5027bkwuzg1jz, child number 0                                                                                                                 
-------------------------------------                                                                                                                 
select /*+ gather_plan_statistics */ count(*) from cust where                                                                                         
cust_state = 'MT' and cust_country = 'US'                                                                                                             
                                                                                                                                                      
Plan hash value: 2062844691                                                                                                                           
                                                                                                                                                      
----------------------------------------------------------------------------------------                                                              
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                                              
----------------------------------------------------------------------------------------                                                              
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |       3 |                                                              
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |       3 |                                                              
|*  2 |   INDEX RANGE SCAN| CUST_IDX |      1 |    315 |    315 |00:00:00.01 |       3 |                                                              
----------------------------------------------------------------------------------------                                                              
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   2 - access("CUST_STATE"='MT' AND "CUST_COUNTRY"='US')                                                                                              
                                                                                                                                                      

20 rows selected.

SQL> 

The estimated rows and actual row values now match, courtesy of the extended statistics on the column grouping (cust_state, cust_country), improving the performance of the query since Oracle can now understand the relationship between these two columns. A similar benefit could be realized by using the cust_city and cust_state columns, and it could also be extended to the use of all three. Notice that other than ‘triggering’ Oracle to monitor column usage no additional work was required to generate the extended statistics or create column groups. There is a hidden parameter, _column_tracking_level, that can be set to disable column usage tracking but it’s not advisable to use such parameters without express instructions from Oracle Support.

Automatically generating column groups so extended statistics can be gathered may not be applicable to every situation but it can provide better plans when related columns in a table exist. End users measure performance in time, and the less time spent on returning results makes them happier, I’ve found. And better plans mean less time in the database.

See all articles by David Fitzjarrell

David Fitzjarrell
David Fitzjarrell
David Fitzjarrell has more than 20 years of administration experience with various releases of the Oracle DBMS. He has installed the Oracle software on many platforms, including UNIX, Windows and Linux, and monitored and tuned performance in those environments. He is knowledgeable in the traditional tools for performance tuning – the Oracle Wait Interface, Statspack, event 10046 and 10053 traces, tkprof, explain plan and autotrace – and has used these to great advantage at the U.S. Postal Service, American Airlines/SABRE, ConocoPhilips and SiriusXM Radio, among others, to increase throughput and improve the quality of the production system. He has also set up scripts to regularly monitor available space and set thresholds to notify DBAs of impending space shortages before they affect the production environment. These scripts generate data which can also used to trend database growth over time, aiding in capacity planning. He has used RMAN, Streams, RAC and Data Guard in Oracle installations to ensure full recoverability and failover capabilities as well as high availability, and has configured a 'cascading' set of DR databases using the primary DR databases as the source, managing the archivelog transfers manually and montoring, through scripts, the health of these secondary DR databases. He has also used ASM, ASMM and ASSM to improve performance and manage storage and shared memory.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles