Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Slideshows
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

Oracle

Posted October 26, 2015

Extended Statistics and Oracle 12c

By David Fitzjarrell

Extended statistics can be very helpful to the optimizer by establishing a dependency between two or more columns in a table. Such dependencies can improve cardinality estimates the optimizer uses to determine the best plan of attack for a query. Extended statistics have been available since 11.2.0.x, and now Oracle 12.1.0.2 improves on that by generating extended statistics automatically. Using a feature called Automatic Column Group detection, Oracle can recognize column dependencies and create extended statistics on those recognized groups. Let's look at how this can be done and what benefit it can provide.

Oracle provides a procedure in the DBMS_STATS package that tells the database to monitor column usage, including recognizing any column groups in use. The procedure is simple to run, requiring a SQL set name, an owner and a time limit, any or all of which can be NULL. If the sql set and owner are NULL then Oracle monitors all column usage. In this example a 5-minute time limit is set:


exec dbms_stats.seed_col_usage(null,null,300)

Let's create a table with dependent columns; city/state/country set of columns so that state is dependent on country and city is dependent on state:


SQL> 
SQL> --
SQL> -- Create tables and sequence
SQL> --
SQL> create sequence cust_seq start with 1 increment by 1 nocycle nomaxvalue nocache;

Sequence created.

SQL> 
SQL> create table cust (
  2  	cust_id number not null,
  3  	cust_nm varchar2(80),
  4  	cust_city	varchar2(50),
  5  	cust_state	varchar2(2),
  6  	cust_country	varchar2(2));

Table created.

SQL> 
SQL> create table country(
  2  	country_nm	varchar2(80),
  3  	country_cd2	varchar2(2),
  4  	country_cd3	varchar2(3));

Table created.

SQL> 

Load data into the tables; for the COUNTRY table SQL*Loader is used, and for the CUST table a series of 29000+ inserts are executed (the output of the inserts is not shown for space reasons). Additionally primary and foreign key constraints were created to ensure referential integrity:


SQL> 
SQL> --
SQL> -- Load country data
SQL> --
SQL> $sqlldr bing/#######@*********** country.ctl

SQL> 
SQL> --
SQL> -- Create primary key and foreign key constraints
SQL> --
SQL> -- Ensures no invalid country codes are entered
SQL> --
SQL> alter table country add constraint country_pk primary key(country_cd2);

Table altered.

SQL> 
SQL> alter table cust add constraint cust_country_fk foreign key(cust_country) references country(country_cd2);

Table altered.

SQL> 
SQL> --
SQL> -- Load the customer data
SQL> --
SQL> @ins_cust_data

...

SQL> 

An index was created on the CUST table to speed access to the data:


SQL> 
SQL> --
SQL> -- Index CUST table for easier access
SQL> --
SQL> create index cust_idx on cust(cust_state, cust_country);

Index created.

SQL>

Statistics are gathered on the CUST table:


SQL> 
SQL> --
SQL> -- Gather initial statistics
SQL> --
SQL> exec dbms_stats.gather_table_stats(user, 'CUST')

PL/SQL procedure successfully completed.

SQL>

Two queries are run that relate the cust_state column with the cust_country column. Plan statistics are gathered on the second run and the plan is displayed:


SQL> 
SQL> --
SQL> -- Execute two queries to see how the optimizer
SQL> -- fares with the existing statistics
SQL> --
SQL> select count(*)
  2  from cust
  3  where cust_state = 'MT' and cust_country = 'US'
  4  /

  COUNT(*)                                                                                                                                            
----------                                                                                                                                            
       315                                                                                                                                            

SQL> 
SQL> select /*+ gather_plan_statistics */ count(*)
  2  from cust
  3  where cust_state = 'MT' and cust_country = 'US'
  4  /

  COUNT(*)                                                                                                                                            
----------                                                                                                                                            
       315                                                                                                                                            

SQL> 
SQL> --
SQL> -- Generate the plan to see how far off the estimate is
SQL> -- from the actual number of rows
SQL> --
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT                                                                                                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  5027bkwuzg1jz, child number 0                                                                                                                 
-------------------------------------                                                                                                                 
select /*+ gather_plan_statistics */ count(*) from cust where                                                                                         
cust_state = 'MT' and cust_country = 'US'                                                                                                             
                                                                                                                                                      
Plan hash value: 2062844691                                                                                                                           
                                                                                                                                                      
----------------------------------------------------------------------------------------                                                              
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                                              
----------------------------------------------------------------------------------------                                                              
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |       3 |                                                              
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |       3 |                                                              
|*  2 |   INDEX RANGE SCAN| CUST_IDX |      1 |    578 |    315 |00:00:00.01 |       3 |                                                              
----------------------------------------------------------------------------------------                                                              
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   2 - access("CUST_STATE"='MT' AND "CUST_COUNTRY"='US')                                                                                              
                                                                                                                                                      

20 rows selected.

SQL> 

With the current statistics the optimizer isn't aware of the column grouping so the cardinality estimate is off by almost a factor of 2. Let's 'fix' that by enabling column usage tracking across the database:


SQL> 
SQL> --
SQL> -- Enable the automatic extended stats gathering by
SQL> -- recording column usage
SQL> --
SQL> exec dbms_stats.seed_col_usage(null,null,300)

PL/SQL procedure successfully completed.

SQL> 

Run the same set of queries again:


SQL> 
SQL> --
SQL> -- Run the same queries again
SQL> --
SQL> select count(*)
  2  from cust
  3  where cust_state = 'MT' and cust_country = 'US'
  4  /

  COUNT(*)                                                                                                                                            
----------                                                                                                                                            
       315                                                                                                                                            

SQL> 
SQL> select /*+ gather_plan_statistics */ count(*)
  2  from cust
  3  where cust_state = 'MT' and cust_country = 'US'
  4  /

  COUNT(*)                                                                                                                                            
----------                                                                                                                                            
       315                                                                                                                                            

SQL> 

Gather statistics again on the CUST table, and this time extended statistics will be generated:


SQL> 
SQL> --
SQL> -- Generate current statistics
SQL> --
SQL> -- These will include extended statistics
SQL> -- for the recognized column groups
SQL> --
SQL> exec dbms_stats.gather_table_stats(user, 'CUST')

PL/SQL procedure successfully completed.

SQL> 

Run the same set of queries one more time, and report the current plan:


SQL> 
SQL> --
SQL> -- Run the queries one more time
SQL> --
SQL> select count(*)
  2  from cust
  3  where cust_state = 'MT' and cust_country = 'US'
  4  /

  COUNT(*)                                                                                                                                            
----------                                                                                                                                            
       315                                                                                                                                            

SQL> 
SQL> select /*+ gather_plan_statistics */ count(*)
  2  from cust
  3  where cust_state = 'MT' and cust_country = 'US'
  4  /

  COUNT(*)                                                                                                                                            
----------                                                                                                                                            
       315                                                                                                                                            

SQL> 
SQL> --
SQL> -- Current plan now shows the row estimate matches
SQL> -- the actual count, courtesy of automatic generation
SQL> -- of extended statistics
SQL> --
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT                                                                                                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  5027bkwuzg1jz, child number 0                                                                                                                 
-------------------------------------                                                                                                                 
select /*+ gather_plan_statistics */ count(*) from cust where                                                                                         
cust_state = 'MT' and cust_country = 'US'                                                                                                             
                                                                                                                                                      
Plan hash value: 2062844691                                                                                                                           
                                                                                                                                                      
----------------------------------------------------------------------------------------                                                              
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                                              
----------------------------------------------------------------------------------------                                                              
|   0 | SELECT STATEMENT  |          |      1 |        |      1 |00:00:00.01 |       3 |                                                              
|   1 |  SORT AGGREGATE   |          |      1 |      1 |      1 |00:00:00.01 |       3 |                                                              
|*  2 |   INDEX RANGE SCAN| CUST_IDX |      1 |    315 |    315 |00:00:00.01 |       3 |                                                              
----------------------------------------------------------------------------------------                                                              
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   2 - access("CUST_STATE"='MT' AND "CUST_COUNTRY"='US')                                                                                              
                                                                                                                                                      

20 rows selected.

SQL> 

The estimated rows and actual row values now match, courtesy of the extended statistics on the column grouping (cust_state, cust_country), improving the performance of the query since Oracle can now understand the relationship between these two columns. A similar benefit could be realized by using the cust_city and cust_state columns, and it could also be extended to the use of all three. Notice that other than 'triggering' Oracle to monitor column usage no additional work was required to generate the extended statistics or create column groups. There is a hidden parameter, _column_tracking_level, that can be set to disable column usage tracking but it's not advisable to use such parameters without express instructions from Oracle Support.

Automatically generating column groups so extended statistics can be gathered may not be applicable to every situation but it can provide better plans when related columns in a table exist. End users measure performance in time, and the less time spent on returning results makes them happier, I've found. And better plans mean less time in the database.

See all articles by David Fitzjarrell



Oracle Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM