Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted March 3, 2014

Review Your BUCKET_COUNT Statistics with DMV

By Greg Larsen

In one of my prior articles on In-Memory OLTP tables we talked about the different types of indexes supported by In-Memory OLTP tables: HASH and RANGE.  In that article I discuss the advantages and disadvantages of each of these different index types.  In this article I will be exploring the BUCKET_COUNT setting of a HASH index and how to determine how well SQL Server distributes the In-Memory table rows across multiple buckets of a HASH index.  We will be doing that by exploring a new DMV that was provided with SQL Server 2014, along with the In-Memory OLTP table functionality.

High Level Architecture of a HASH Index

As a refresher from my prior article let’s review what a HASH index is and the high level architecture associated with a HASH index. 

A HASH index is an index that is good for searches that involve an equality operation.   As the name suggests, a HASH index stores rows based on the hashed value of the key column or columns of a row.  Each hashed index has a number of buckets to support the hashed index.  SQL Server hashes each index value and equates the hashed value to a specific bucket.   The actual row is then hung off the bucket with which its key value is associated.  Each row that maps to the same bucket is chained together with pointers. 

The following diagram shows logically how input rows are run through a hash function and then mapped to a bucket and how rows are chained together in a HASH index:

How Input Rows are Run Through a Hash Function and Mapped to a Bucket

How Input Rows are Run Through a Hash Function and Mapped to a Bucket

In this diagram the table being indexed contains two columns: CD and CDNum.  The HASH index is being created off the CD column.    This diagram shows that there are 8 different bucket values (1 through 8).  When two different key columns are run through the hash function and they map to the same bucket this is known as a hash collision.  When a hash collision occurs the multiple rows are chained together off a single bucket with pointers from one row to the next row in the chain.      

Ideally you want one row associated with a bucket.  This can only be done when the hash key columns are unique, the number of buckets associated with an index is greater than the number of key values, and the hash function associates a different bucket with every key value.   This is why Microsoft recommends that you have 2 times the number of buckets than the expected number of unique index values.  Keep in mind two different key column values can hash to the same bucket, as in the diagram above.

When you create a HASH index you need to specify the BUCKET_COUNT for your index.  When you specify a BUCKET_COUNT size in your index statement SQL Server rounds the number up to the next power of 2.  If you inappropriately set the BUCKET_SIZE too small and have lots of hash collisions SQL Server doesn’t automatically change the bucket size.  The only way to change the bucket size is to drop and recreate the index.   This is why setting the BUCKET_COUNT correctly the first time is so important.

In order to better understand how to set the BUCKET_COUNT value and to view your hash index statistics let me go through a couple of examples.

Creating In_Memory Sample Database

In order to go through my examples I will first need to create a database that supports In-Memory tables.  To create this database I will use the following code.  If you want to following along and actually run my examples you will need to download and install SQL Server 2014 CTP2 from here , and then run the following script:

-- =============================================
-- Create HashDemo Database
-- =============================================
USE master
GO
IF EXISTS (
  SELECT *
    FROM sys.databases
   WHERE name = N'HashDemo'
)
  DROP DATABASE HashDemo
GO
 
CREATE DATABASE HashDemo
ON PRIMARY
  (NAME = HashDemo_Data,
    FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\ HashDemo_Data.mdf',
          SIZE = 100MB,
          FILEGROWTH = 100MB),
 
FILEGROUP InMemory_InMemory CONTAINS MEMORY_OPTIMIZED_DATA
  ( NAME = InMemory_InMemory,
    FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\ HashDemo_secondary.mdf')
 
LOG ON
  ( NAME = InMemory_Log,
    FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\ HashDemo _Log.ldf',
          SIZE = 100MB,
          FILEGROWTH = 100MB)
GO
 

Exploring the Statistics of HASH Indexes

In this section I will be exploring creating a HASH index with different BUCKET_COUNT values and then using the new sys.dm_db_xtp_hash_index_stats DMV to review the BUCKET_COUNT statistics after I populate the table with rows of data. 

For my first example let me create and populate the table that contains the same structure and rows as the diagram I showed earlier in my article.  To accomplish this I will run the following code:

CREATE TABLE HashDemo1 
   ([CD] char(1) COLLATE LATIN1_GENERAL_BIN2 NOT NULL,
    [CDNum] INT NOT NULL,
       INDEX IX_CD NONCLUSTERED HASH ([CD]) WITH (BUCKET_COUNT=8))
   WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY);

 SET NOCOUNT ON;
INSERT INTO HashDemo1 VALUES
       ('A',1),
       ('B',2),
       ('C',3),
       ('D',4),
       ('E',5),
       ('F',6),
       ('G',7),
       ('H',9),
       ('I',9),
       ('J',10);

To review the BUCKET_COUNT statistics for the HASH index I just created I can run the following command that uses the new sys.dm_db_xtp_hash_index_stats DMV in the following TSQL statement:

SELECT OBJECT_NAME(S.object_id) as TableName
     , I.name AS IndexName
        , S.total_bucket_count
        , S.empty_bucket_count
        , S.avg_chain_length
        , S.max_chain_length
FROM sys.dm_db_xtp_hash_index_stats S
JOIN sys.indexes I
on S.index_id = I.index_id
   AND 
   S.object_id = I.object_id;

When I run this command I get the following output:

TableName IndexName total_bucket_count empty_bucket_count avg_chain_length max_chain_length
--------- --------- ------------------ ------------------ ---------------- ---------------- 
HashDemo1 IX_CD     8                  0                  1                2

By reviewing this output you can see my IX_CD hash index has a “total_bucket_count” of 8, and all of the buckets are used because the “empty_bucket_count” value is 0.  You can tell there are hash collisions because the “max_chain_length” value is 2.   This value identifies that no more than two rows are hung off a single bucket.  Therefore we can conclude there are two different rows hung off two different buckets.

Now that we know more about the HASH index, bucket count, and hash collision, and the new DMV  that shows you HASH index statistics let’s review another example.  In this example I want to show you how to identify when you have created an index that doesn’t have a large enough BUCKET_COUNT size for a HASH index.    For this example I’m going to run this code to create my table and HASH index:

CREATE TABLE HashDemo2 
   ([Id] int NOT NULL,
    [LineNum] varchar(18)  COLLATE LATIN1_GENERAL_BIN2 NOT NULL,
       INDEX IX_Id NONCLUSTERED HASH ([ID]) WITH (BUCKET_COUNT=2048))
   WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY);

SET NOCOUNT ON;
DECLARE @I INT = 0;
WHILE @I < 10000
BEGIN
       INSERT INTO HashDemo2 VALUES(@I, 'Line Num = ' + cast(@I as char(6)));
    SET @I += 1;
END
 

Here I have create a table named HashDemo2, which has a HASH index with a BUCKET_COUNT value of 2048.  Once the table is created I populated it with 10,000 rows.  If you look at the while loop above you can see that I have created 10,000 unique values for the HASH index key, in this case the ID column of my HashDemo2 table. 

To see the BUCKET_COUNT statistics for my HashIndex2 table I can run the following statement:

SELECT OBJECT_NAME(S.object_id) as TableName
     , I.name AS IndexName
        , S.total_bucket_count
        , S.empty_bucket_count
        , S.avg_chain_length
        , S.max_chain_length
FROM sys.dm_db_xtp_hash_index_stats S
JOIN sys.indexes I
on S.index_id = I.index_id
   AND
   S.object_id = I.object_id
WHERE OBJECT_NAME(S.object_id) = 'HashDemo2';

When I run this code I get the following output:

TableName IndexName total_bucket_count empty_bucket_count avg_chain_length max_chain_length
--------- --------- ------------------ ------------------ ---------------- ----------------
HashDemo2 IX_Id     2048               0                  4                6

By reviewing this output we can note a few things.  First the “avg_chain_length” is equal to 4.  This means we have HASH collisions when the ID columns were hashed to the same bucket.  This make sense because we selected a bucket count of 2048, but yet we know we inserted 10,000 unique ID values into the HashDemo2 table.  Also worth noting is we have used all of the buckets. We can tell this because the “empty_bucket_count” is zero (0).  Additionally at least one bucket has 6 rows hung off of it.  I can tell this because the max_chain_length is 6.

Let’s drop this table, re-create it and re-populate using the same 10,000 row.  This time when I create my index I will specify a BUCKET_COUNT = 10000, the same as the number of unique ID values I am creating in my while loop.   I will do that by running the following code:

DROP TABLE HashDemo2 
CREATE TABLE HashDemo2 
   ([Id] int NOT NULL,
    [LineNum] varchar(18)  COLLATE LATIN1_GENERAL_BIN2 NOT NULL,
       INDEX IX_Id NONCLUSTERED HASH ([ID]) WITH (BUCKET_COUNT=10000))
   WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY);

SET NOCOUNT ON;
DECLARE @I INT = 0;
WHILE @I < 10000
BEGIN
       INSERT INTO HashDemo2 VALUES(@I, 'Line Num = ' + cast(@I as char(6)));
    SET @I += 1;
END

After my table is created and populated I’ll run the following code to look at the HASH index statistics:

SELECT OBJECT_NAME(S.object_id) as TableName
     , I.name AS IndexName
        , S.total_bucket_count
        , S.empty_bucket_count
        , S.avg_chain_length
        , S.max_chain_length
FROM sys.dm_db_xtp_hash_index_stats S
JOIN sys.indexes I
on S.index_id = I.index_id
   AND 
   S.object_id = I.object_id 
WHERE OBJECT_NAME(S.object_id) = 'HashDemo2';

After running this code I get the following HASH index statistics:

TableName IndexName total_bucket_count empty_bucket_count avg_chain_length max_chain_length
--------- --------- ------------------ ------------------ ---------------- ----------------
HashDemo2 IX_Id     16384              6384               1                1

By review these statistic we can now see that the “avg_chain_length” is 1 and the “max_chain_length” is 1, which means that each ID value in my table hashed to a unique bucket.  We can also see we now have 6384 empty buckets.  Why do we have so many empty buckets when I specified a BUCKET_COUNT = 10000 when I created my HASH index?  This is because SQL Server takes the BUCKET_COUNT value specified on the HASH index statement and rounds the value up to the next power of 2, which for this example was 16,384.  You can see this value specified in the “total_bucket_count” column.

Setting Your Bucket Size Correctly

The new SQL Server 2014 DMV sys.dm_db_xtp_hash_index_stats provides you a quick method to determine how well you you’ve set the BUCKET_COUNT on your HASH indexes.  If you should find the avg_chain_length or the max_chain_length to be higher than 1 and the number of the empty_bucket_count to be zero (0) then you have not have selected a large enough bucket count for your hash index.  Understanding your data and the number of unique values for your HASH indexes will help you correctly set your BUCKET_COUNT when you create your HASH index.

See all articles by Greg Larsen



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM