SQL Server: How Important Are Index Statistics

Have you ever heard these kinds of comments before?

  • “How come my TSQL statement is running slow today, when just last week it was running fine?”
  • “I have a process that as I add more rows to my table it gets slower and slower?”

If so then possibly your index statistics are out of date.  In this article I will show you how to identify when your index statistics were updated last, what method was used to gather statistics, and how to update your statistics.

What are Statistics and How are They Used

When the SQL Server database engine parses and compiles a query string into an execution plan it needs some clues to determine what kind of plan to generate.   For a given query string there might be hundreds, thousands, or maybe even millions of different ways to execute the query string.  SQL Server needs some way to determine quickly which execution plan might yield the fastest, most efficient execution plan.   It does this by using statistics.

Statistics are created on indexed columns as well as can be created on individual columns that are not contained in an index.  The database engine even creates statistics on some of the columns used in WHERE and JOIN columns automatically. Statistics keep track of the value distribution of a column.  Meaning SQL Server maintains the cardinality of values for a given column.  The Query Optimizer uses the column value distribution to determine whether an index seek would be better than an index scan operation.  

When you submit a query to SQL Server that contains a WHERE constraint the Query Optimizer uses the value you provide for your WHERE constraint and compares it to the statistics that SQL Server has stored.  When the optimizer does this it is trying to guess the number of rows that will be returned by your query, by using the statistics.  By predicting the number of rows that will be returned based on the statistics, SQL Server is able to determine the most efficient access method it should use to return results for your query.

Determine How Old Your Statistics Are and What Sampling Rate was Used

There are a number of different ways to show statistics depending on what version of SQL Server you are running.  The first method I will show you is DBCC SHOW_STATISTICS, which is available in all versions of SQL Server.

SQL Server provides the DBCC SHOW_STATISTICS command to display index statistics.  This DBCC command has a number of options, but the STAT HEADER option is the one I used to display my index statistics.  By using the DBCC SHOW_STATISTICS command with the STAT_HEADER option you can determine the following:

  • The date when statistics were last  generated
  • Number of rows in the table when the statistics were generated
  • Number of rows that were sampled

Here is an example of how you might display the STAT_HEADER information for the primary key index on the SalesOrderDetail table in the AdventureWorks2008R2 DB:

DBCC SHOW_STATISTICS ('Sales.SalesOrderDetail',
                      PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID) 
    WITH STAT_HEADER;

Below you will find the index statistics information displayed when I ran this DBCC SHOW_STATISTICS command (note lines have been formatted to make it easier to read):

Name                                                 Updated              Rows    Rows Sampled  
---------------------------------------------------- -------------------- ------- ------------- 
PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID  Mar 29 2012  1:52PM  121317  121317       
 
Steps  Density    Average key length String Index Filter Expression
------ ---------- ------------------ ------------ ------------------
 150    0.2694389  8                  NO           NULL                                                                                                                                                                                                                                                             

By looking at the information returned using the STAT_HEADER option you can determine how out of date your statistics might be.  In my output above you can see that the statistics where last updated on March 29, 2012.   If you know that you are updating your indexes daily, and the date when the stats were last created is many days or months old then you can conclude your stats are out of date. How far out of date really depends on the number of rows you have added since the statistics were last updated.  Additionally if you look at the “Rows” column you can tell how many rows were in the table at the time the statistics were created.  You can use the “Rows” number in conjunction with the total number of records currently in a table to help determine the degree to which your statistics are out of date. If you find your table has had a lot of rows added since the last time the statistics were updated then you should consider updating your index statistics.  Out of data statistics cause inefficient execution plan.  Additionally if you look at the “Rows Sampled” column and compare it to the “Rows” column you can test whether or not SQL Server scanned the complete table to create the statistics by sampling some of rows.  In my output above the “Rows” and “Rows Sampled” are the same, which indicates when the statistics were created, SQL Server sampled all the rows to create the statistics.

The DBCC SHOW_STATISTICS command has a number of different options that I didn’t talk about.  For a complete list of the kind of information this command can display you should refer to Books Online documentation.

If you are running SQL Server 2008 R2 SP2, or SQL Server 2012 SP1 you can also use the sys.dm_db_stats_properties DMF to gather index statistics.  This DMF requires two parameters: object_id and  stat_id.   I can run the following command to identify the object_id and the stat_id for my “PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailId” index:

SELECT sch.name + '.' + so.name
      ,so.object_id
         ,ss.name
         ,ss.stats_id 
FROM sys.stats ss 
JOIN sys.objects so ON ss.object_id = so.object_id 
JOIN sys.schemas sch ON so.schema_id = sch.schema_id 
WHERE so.name =  N'SalesOrderDetail' 
  and sch.name = N'Sales';
 
 

When I run the above command I can take the object_id and stat_id for my index and display the statistics using the following SELECT statement:

SELECT * FROM sys.dm_db_stats_properties(898102240,1);

The value of using the sys.dm_db_stats_properites DMF is you can use it to show all the statistics in your database if you’d like, or a subset of stats.  Here is an example where I show all the statistics in the database:

SELECT * FROM sys.stats ss 
JOIN sys.objects so ON ss.object_id = so.object_id 
JOIN sys.schemas sch ON so.schema_id = sch.schema_id 
CROSS APPLY sys.dm_db_stats_properties(so.object_id,ss.stats_id);

Using UPDATE STATISTICS Command

You can use the UPDATE STATISTICS command to update your statistics.  With the UPDATE STATISTICS command you can update the statistics on a single index, or all of the indexes on a table.  When you update your statistics you can sample a number or a percentage of rows, or can scan all of the rows to create the statistics.   To get the most accurate statistics you should have the UPDATE STATISTICS command scan all the rows.  Keep in mind if you have millions or billions of rows it might take some time to sample all the rows when creating statistics.

Here is an example of the UPDATE STATISTICS command where I am only sampling 50% of the rows:

UPDATE STATISTICS Sales.SalesOrderDetail PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID
WITH SAMPLE 50 PERCENT;

Review Books Online documentation for the complete syntax of the UPDATE STATISTICS command.

The Value of Up to Date Statistics

Have you ever noticed your query performance degrade as you add more records to your database, then found when you updated the statistics your query runs faster?  If so then you probable already realized the value of having up to data statistics.  Knowing what statistics are, how they are used, how out of date they are and how to update them allows you optimize your statistics.  If you ever find your queries to be running slow you might want to see how old your statistics are on the tables you are querying.

See all articles by Greg Larsen

Gregory Larsen
Gregory Larsen
Gregory A. Larsen is a DBA at Washington State Department of Health (DOH). Greg is responsible for maintaining SQL Server and other database management software. Greg works with customers and developers to design and implement database changes, and solve database/application related problems. Greg builds homegrown solutions to simplify and streamline common database management tasks, such as capacity management.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles