Importance of Statistics and How It Works in SQL Server – Part 2

Introduction

In my last article I discussed the basics of statistics and why it is needed. I also covered when it needed to be created or updated and how it looks. In this article I will talk about how performance gets impacted because of missing and out-of-date statistics, how statistics are managed in SQL Server, what are filtered statistics and what is new in SQL Server 2012 related to statistics.

When the Query Performance Goes for a Toss

As a DBA, have you ever come across a situation where your users start reporting that queries are running slowly? If a user is running a new query with a different set of columns than before, you might have to look for index strategies. But if a user is running the same old query, which ran in an acceptable amount of time earlier but is now taking significantly longer, this might be because of stale or out-of-date statistics on the columns of the table being used in the query. There are certain operations after which you need to ensure statistics are up-to-date for the predictable query response time, for example, when you are performing some operations that change distribution of data significantly, such as truncating a table or performing a bulk insert of a large percentage of the rows or delete a large number of rows etc.

Though if you have set the AUTO_UPDATE_STATISTICS database property to ON, SQL Server Query Optimizer routinely updates statistics when it finds out-of-date statistics being used in the query, but for predictable response time for your user queries its always better to update statistics as part of the bulk data operations itself.

When you see performance issues with your queries, just check if there is any warning for missing statistics or if the “Actual Number of Rows” and “Estimated Number of Rows” have huge differences (this happens because of the out-of-date statistics), in that case you need to update statistics accordingly.

Please note, as part of the maintenance activities if you are performing index rebuild (either using ALTER INDEX REBUILD or DBCC DBREINDEX commands) SQL Server Query Optimizer updates statistics on the respective columns with the index rebuild operations and you don’t need to do it separately. This is applicable only if you are re-building the indexes and does not apply if you are re-organizing indexes.

Statistics Management in SQL Server

There are different ways statistics are created and maintained in SQL Server:

    • By SQL Server engine itself – There are some database level properties that determine the automatic creation and updating statistics whenever there is a need. For example,
      • AUTO_CREATE_STATISTICS property of the database, if set to TRUE, lets SQL Server (or more specifically SQL Server Query Optimizer) routinely create single-column statistics for query predicate columns as necessary, to improve cardinality estimates for the query execution plan if that specific column does not already have a histogram in an existing statistics object. The name for these statistics starts with _WA as you can see in the figure below, as an example for a table.

        Statistics
        Statistics

        You can also use the below query to find out all of those statistics created by SQL Server Query Optimizer for a specific table:

        SELECT
          OBJECT_NAME(stats.object_id) AS TableName,
          COL_NAME(stats_columns.object_id, stats_columns.column_id) AS ColumnName,
          stats.name AS StatisticsName
        FROM sys.stats AS stats 
        JOIN sys.stats_columns AS stats_columns ON stats.stats_id = stats_columns.stats_id 
                AND stats.object_id = stats_columns.object_id
        WHERE OBJECT_NAME(stats.object_id) = 'SalesOrderHeader'
        AND stats.name like '_WA%'
        ORDER BY stats.name;

         

        Statistics Created by SQL Server Query Optimizer - Results
        Statistics Created by SQL Server Query Optimizer – Results

      • AUTO_UPDATE_STATISTICS property of the database, if set to TRUE lets SQL Server (or more specifically SQL Server Query Optimizer) routinely update the statistics being used by the query when they are stale (out-of-date) . Unlike AUTO_CREATE_STATISTICS, which applies for creating single column statistics only. AUTO_UPDATE_STATISTICS updates statistics objects created for indexes, single-columns in query predicates, filtered statistics and statistics created using the CREATE STATISTICS command.
      • By default, identified stale statistics are updated synchronously, which means the query being executed will be put on hold until the required statistics are updated in order to ensure the query always compiles and executes with up-to-date statistics. Sometimes this wait could be longer, especially when a table involved in the query is bigger in size, and might cause the client request time-out. In order to deal with such a situation, SQL Server has AUTO_UPDATE_STATISTICS_ASYNC property of the database, which if set to TRUE lets the current running query compile with existing statistics even if the existing statistics are stale (chooses suboptimal query plan) and initiates a process in the background asynchronously to update the stale statistics in order to ensure subsequent query compilation and execution uses up-to-date statistics.

There are different ways you can change these properties; for example, you can use a script to change or use the Database Properties dialog box in SQL Server Management Studio to change it as shown below:

ALTER DATABASE [StatisticsTest] SET       AUTO_CREATE_STATISTICS ON
ALTER DATABASE [StatisticsTest] SET       AUTO_UPDATE_STATISTICS ON
ALTER DATABASE [StatisticsTest] SET       AUTO_UPDATE_STATISTICS_ASYNC OFF
GO

Database Properties – Statistics Test
Database Properties – Statistics Test

        • When you create an index – When you create indexes on a table or view, statistics are automatically created on the key columns of the indexes. This applies for filtered indexes as well. In other words, when you create filtered indexes, even filtered statistics are created along with that too.

    For example, as you can see in the script below I created a non-clustered index on ProductID and accordingly, statistics on the same key column has been created with the same name as of index:

    CREATE NONCLUSTERED INDEX       [NCI_SalesOrderDetail_ProductID]
    ON       [dbo].[SalesOrderDetail] ([ProductID])
    GO

    Dbo.SalesOrderDetail
    Dbo.SalesOrderDetail

          • Manually creating statistics on strategic columns – As discussed above, statistics are created when you create indexes or are created automatically by SQL Server Query Optimizer on the single column when the AUTO_CREATE_STATISTICS database property is set to ON. But there might be times when you need to create additional statistics using the CREATE STATISTICS command, syntax mentioned below, to capture statistical correlations so that it enables the SQL Server Query Optimizer to create improved query plans and drop it when not needed using DROP STATISTICScommand.
            CREATE STATISTICS statistics_name 
            ON { table_or_indexed_view_name } ( column1, column2, ...n ) 
                      [ WHERE <filter_predicate_for_filtered_statistics> ]
                      [ WITH 
                          [ [ FULLSCAN 
                            | SAMPLE number { PERCENT | ROWS } 
                            | STATS_STREAM = stats_stream ] ] 
                          [ [ , ] NORECOMPUTE ] 
                          [ [ , ] INCREMENTAL = { ON |       OFF } ]
                      ] ;
                • WHERE – with this clause you can specify a filter predicate for filtered statistics.
                • FULLSCAN – With this clause, you specify to compute statistics by scanning all rows in the table or indexed view or use SAMPLE to create a based on sample.
                • NORECOMPUTE – With this clause, you specify to disable the automatic statistics update by SQL Server when AUTO_STATISTICS_UPDATE is ON. It’s recommended to not use this feature often and let SQL Server Query Optimizer decide and update statistics as and when needed.
                • INCREMENTAL – This is a new clause available in SQL Server 2014, its default value is OFF which means stats are combined for all partitions. If you set to ON, the statistics created are per partition statistics.

            For example, the below given script creates statistics on the Employee table for Name, Age and Salary columns by scanning all the rows of the table. In this case, histogram is generated for Name, densities for the following column prefixes: (Name), (Name, Age), and (Name, Age, Salary).

            CREATE STATISTICS statis_Employee_Name_Age_Salary 
            ON       [dbo].[Employee] (Name, Age, Salary)
            WITH FULLSCAN;

      Filtered Index and Filtered Statistics

      Normally, statistics are created by considering all the values for all rows. But starting with SQL Server 2008, like filtered index you can also create filtered statistics on a subset of rows from a table. This comes in handy when your queries select only a subset of rows as these subset of rows will have completely different data distribution.

      Filtered statistics are created either when you create a filtered index (on the same subset of rows specified for the filtered index) or by using the CREATE STATISTICS command along with the WHERE clause to specify the filter predicate.

      What’s New in SQL Server 2012 with Respect to Statistics

      In earlier versions of SQL Server, if you use a database in read-only mode or a database snapshot, your queries will compile with the existing statistics even though it is not up-to-date. SQL Server will not be able to create statistics when it is missing or has become stale as changes to the database are not allowed, which means SQL Server will continue executing your queries with a sub-optimal plan.

      Starting with SQL Server 2012, SQL Server Query Optimizer creates and maintains temporary statistics in tempdb database for the read-only database or read-only snapshot database or readable secondaries in the AlwaysOn cluster in order to ensure your queries perform better. These temporary statistics are created and maintained by SQL Server Query Optimizer only; though you can delete them when not needed. These statistics are suffixed with “_readonly_database_statistic” to differentiate it from the regular statistics.

      Please note, as the tempdb database is re-created every time SQL Server service is restarted, all your temporary statistics will disappear after restart.

      Conclusion

      Statistics in SQL Server plays a pivotal role in efficient query execution. In this article, I discussed how the presence of up-to-date statistics plays an important role in the generation of high-quality query execution plans. I discussed the details of statistics, when to create and when to update it. I also talked about filtered statistics (introduced with SQL Server 2008) and new statistics enhancements in SQL Server 2012.

      Resources

      Statistics

      Statistics Used by the Query Optimizer in Microsoft SQL Server 2008

      DBCC SHOW_STATISTICS

      See all articles by Arshad Ali

      Arshad Ali
      Arshad Ali
      Arshad Ali works with Microsoft India R&D Pvt Ltd. He has 8+ years of experience, mostly on Microsoft Technologies. Most recently, as a SQL Developer and BI Developer he has been working on a Data Warehousing project. Arshad is an MCSD, MCITP: Business Intelligence, MCITP: Database Developer 2008 and MCITP: Database Administrator 2008 certified and has presented at several technical events including SQL-School. On an educational front, he has an MCA (Master in Computer Applications) and MBA-IT. Disclaimer : I work for Microsoft and help people and businesses make better use of technology to realize their full potential. The opinions mentioned herein are solely mine and do not reflect those of my current employer or previous employers.

      Get the Free Newsletter!

      Subscribe to Cloud Insider for top news, trends & analysis

      Latest Articles