In my article last month I introduced you to In-Memory table or what Microsoft refers to as In-Memory OLTP, or Hekaton. This new type of table is available with SQL Server 2014. In that article I demonstrated the basics of creating an In-Memory table. In this article I will discuss the different types of indexes you can place on your In-Memory tables, and how those indexes support different search criteria.
What does Hekaton Mean?
I suppose I should have mentioned this in my last article, but I didn’t. During the week of October 13, 2013 I was as at the PASS Summit when David DeWitt , a frequent keynote presenter, shared his thoughts about the Hekaton name and then why it is now called In Memory OLTP.
“Hekaton” means hundred, or a hundredfold. Since Microsoft was trying to improve the query engine with In-Memory tables by 100 times the name “Hekaton” seemed appropriate. As David DeWitt shared they didn’t quite get a hundredfold improvement with this version of In Memory OLTP. The marketing people of course decided they should call the feature In Memory OLTP, but of course David DeWitt suggested the SQL Server community vote on whether they liked the name “Hekaton” or “In-Memory OLTP”.
In addition to the story about the hekaton code name, David DeWitt gave a great presentation that boggled the mind on how exactly In-Memory OLTP actually works and the theories behind it. You can download his slides from here. You can also watch his PASS Summit Keynote presentation from this location: http://www.sqlpass.org/summit/2013/PASStv.aspx?watch=aW3-0G-SEj0. Skip 27 minutes into this video to get to the start of David DeWitt’s presentation.
Different Types of Indexes on Memory Optimized Tables
When Microsoft released CTP1 version of SQL Server 2014 only the HASH index was available for creating In-Memory tables. At the PASS Summit on Oct 16, 2013 Microsoft released CTP2, which can be downloaded from here. With this new release of SQL Server 2014 they have now included the RANGE index as well.
The HASH and RANGE indexes support different kinds of queries. The HASH index is good for supporting equality predicates. If you have lots of queries that are going to have to search tables where you are constraining the data being returned by the EQUAL (“=”) operator then a HASH index would be beneficial to support those type of queries. If your query profile requires you to search for values using the inequality operators like “>”, “<”, etc. then you will find that RANGE indexes uses less resources and provide faster response to these type of queries. A RANGE index is just a modified B-Tree index. Both HASH and RANGE indexes are inherently covering indexes since the actual internal index points to the underlying row data. In future articles I will go into more detail regarding the internal aspects of In-Memory OLTP tables.
Creating an In-Memory Table for Testing
In order to demonstrate the values of the different type of In-Memory table indexes let me create a database and then create a couple of tables that I can use for testing. To do this I will use the script below. I will be using these tables to demonstrate the value of an In-Memory table verses normal table and to show you how the HASH index performs based on the search criteria for a query. If you want to following along and actually run my demo, you will need to download and install SQL Server 2014 CTP2 from here , then then run the following script:
-- ============================================= -- Create InMemory Database -- ============================================= USE master GO IF EXISTS ( SELECT * FROM sys.databases WHERE name = N'InMemory' ) DROP DATABASE InMemory GO CREATE DATABASE InMemory ON PRIMARY (NAME = InMemory_Data, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\InMemory_Data.mdf', SIZE = 100MB, FILEGROWTH = 100MB), FILEGROUP InMemory_InMemory CONTAINS MEMORY_OPTIMIZED_DATA ( NAME = InMemory_InMemory, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\InMemory_InMemory.mdf') LOG ON ( NAME = InMemory_Log, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\InMemory_Log.ldf', SIZE = 100MB, FILEGROWTH = 100MB) GO USE InMemory; GO -- Create Normal Data Table CREATE TABLE SalesOrder ( SalesOrderId int identity PRIMARY KEY NOT NULL ,CustomerId int NOT NULL ,ItemsOrdered int NOT NULL ,SalesOrderAmount int NOT NULL ,SalesOrderDate date NOT NULL ,OrderType char(1) NOT NULL ,INDEX SalesOrder_SalesOrderDate NONCLUSTERED (SalesOrderDate) ); -- Create In Memory OLTP table CREATE TABLE InMemorySalesOrder ( SalesOrderId int NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1000000) ,CustomerId int NOT NULL ,ItemsOrdered int NOT NULL ,SalesOrderAmount int NOT NULL ,SalesOrderDate date NOT NULL ,OrderType char(1) NOT NULL ,INDEX InMemorySalesOrder_SalesOrderDate NONCLUSTERED HASH (SalesOrderDate) WITH (BUCKET_COUNT = 365) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY); SET NOCOUNT ON; -- Load SalesOrder DECLARE @I int = 0; WHILE @I < 1000000 BEGIN SET @I += 1; INSERT INTO SalesOrder VALUES (@I%20, @I%376, @I%100,DATEADD(DD,@I%365,'2012-12-31'), CASE WHEN @I%3 = 0 THEN 'A' WHEN @I%3 = 1 THEN 'B' WHEN @I%3 = 2 THEN 'C' END); END SET @I = 0; -- Load InMemorySalesOrder WHILE @I < 1000000 BEGIN SET @I += 1; INSERT INTO InMemorySalesOrder VALUES (@I,@I%20, @I%376, @I%100,DATEADD(DD,@I%365,'2012-12-31'), CASE WHEN @I%3 = 0 THEN 'A' WHEN @I%3 = 1 THEN 'B' WHEN @I%3 = 2 THEN 'C' END); END
In this script I created a database named InMemory and two tables ( SalesOrder, and InMemorySalesOrder) and then populated them with data. The SalesOrder table is just a normal table that stores its data on disk. The InMemorySalesOrder table is an In-Memory table that has its data stored in memory. I used the “SCHEMA_ONLY” options to identify that I don’t care to keep the data should SQL Server be restarted, or crash. Therefore keep that in mind while you run my different code samples.
On both tables I created a set of indexes. The SalesOrder table has a set of normal indexes that you are all familiar with. But the InMemorySalesOrder table contains two HASH that mirror the indexes created on the SalesOrder table.
To populate these two tables I went through two different WHILE loops. Each of the WHILE loops populates a different table, but each loop does populate each table with the same data. I did this so my examples could compare the perform difference of using In-Memory tables verses disk based tables.
Note this script may take a little time to run since it does load 1 million rows in each table. It took 20 minutes to run on my laptop machine.
Search Performance of In-Memory Table verses Table Stored on Disk
In this section I will be showing the performance of simple search queries against my In-Memory and disk based tables. I will show a couple of different sets of queries. For each set I use the IO and TIME statistics to measure the performance of the different sets of queries.
For this first set of queries I will be using the equality operator in the WHERE clause to search for all the records that have a SalesOrderDate of ‘2013-01-22’. Here is the Transact-SQL code for this set of queries.
-- Equality Operator Search Query Set 1 USE InMemory; GO DBCC FREEPROCCACHE WITH NO_INFOMSGS; DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; SET NOCOUNT ON; SET STATISTICS IO ON; SET STATISTICS TIME ON; SELECT DISTINCT CustomerId FROM SalesOrder WHERE SalesOrderDate = '2013-01-22'; SELECT DISTINCT CustomerId FROM InMemorySalesOrder WHERE SalesOrderDate = '2013-01-22';
When I run this set of queries the following messages are produced:
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'SalesOrder'. Scan count 1, logical reads 3600, physical reads 3, read-ahead reads 3592, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 219 ms, elapsed time = 1701 ms. SQL Server Execution Times: CPU time = 15 ms, elapsed time = 6 ms.
From reviewing this output you can see the query against the disk table SalesOrder performed 3600 logical reads, 3 physical reads, 3592 and read-aheads, whereas there were no I/O’s records for the In-Memory table InMemorySalesOrder. This is because this InMemorySalesOrder table resides In-Memory. As for CPU and elapsed time, my disk based table query took 219 ms of CPU and ran for 1701 ms. But if you look at the stats for the In-Memory based table query you will see my query ran for a total time of 6 ms and used 15 ms of CPU. As you can see the In_memory table query ran considerably faster than my disk based query.
If you review the query execution plan that was used to run this query you will find the following plans were produced:
Query Execution Plan (click for larger image)
By reviewing these plans you can see that the database engine did an index scan operation for the disk storage table, whereas the In-Memory table used a index seek operation. The reason my InMemory table was able to do an index seek operation was because all In-Memory tables are inherently covering indexes by default because In-Memory indexes contain pointers to the underlying row data. This was not the case with the disk storage table.
Also note that the database engine did note a missing index on the first query. The engine identified that the first query could be improved by creating a covering index. Therefore I’m going to create the following covering index on my SalesOrder table and then rerun my Query Set 1 SELECT statements. Here is the script for the covering index I am going to create:
CREATE NONCLUSTERED INDEX SalesOrder_SalesOrderDate_Covering ON [dbo].[SalesOrder] ([SalesOrderDate]) INCLUDE ([CustomerId]);
After creating this index and rerunning Query Set 1 from a new query window, I get the following statistics returned:
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'SalesOrder'. Scan count 1, logical reads 10, physical reads 1, read-ahead reads 7, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 65 ms. SQL Server Execution Times: CPU time = 16 ms, elapsed time = 6 ms.
Still the InMemory table ran faster, but now the disk based table uses considerably less I/O. If you now look at the execution plans for Query Set 1 it now looks like this:
Execution Plans for Query Set 1 (click for larger image)
By reviewing the query execution plan for the disk based table you can now see it does an index seek operation using the covering index I created.
As stated earlier a HASH index on a In_Memory table is optimized for equality operations. Let’s review how they work to support searching for a range of values. To demonstrate this I will running the following query set:
-- Inequality Operator Search Query Set 2 USE InMemory; GO DBCC FREEPROCCACHE WITH NO_INFOMSGS; DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; SET NOCOUNT ON; SET STATISTICS IO ON; SET STATISTICS TIME ON; SELECT DISTINCT CustomerId FROM SalesOrder WHERE SalesOrderDate >= '2013-01-22' AND SalesOrderDate < '2013-01-23'; SELECT DISTINCT CustomerId FROM InMemorySalesOrder WHERE SalesOrderDate >= '2013-01-22' AND SalesOrderDate < '2013-01-23';
Query Set 2 returns exactly the same results as Query Set 1 but uses the inequality operators to constraint the query. Note I would never write code like this. But I have done this to demonistrate how well a HASH index supports a range lookup type query. When I run this code I get the following message results:
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'SalesOrder'. Scan count 1, logical reads 10, physical reads 1, read-ahead reads 13, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 9 ms. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 859 ms, elapsed time = 1299 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.
By reviewing these statistics, you can see the In-Memory OLTP engine had to do a lot of CPU cycles just to resolve this query. Now my In_memory table does not out perform my disk based table. If we review the execution plans for my two SELECT statements in query set 2 it looks like this:
Query Set 2 (click for larger image)
As you can see my query is not doing an index scan operation on my In_memory HASH index, but instead is doing a table scan.
To optimize my In_Memory based table for the inequality operators in my Query Set 2 script I need to create a RANGE index on my In_Memory table. The only way to add a new index on a memory optimized table is to drop and recreate the table. In order to test how well my In_Memory table will support the inequality query above I will recreate it using the following code:
USE InMemory; GO -- Drop and Create In Memory OLTP table DROP TABLE InMemorySalesOrder; GO CREATE TABLE InMemorySalesOrder ( SalesOrderId int NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1000000) ,CustomerId int NOT NULL ,ItemsOrdered int NOT NULL ,SalesOrderAmount int NOT NULL ,SalesOrderDate date NOT NULL ,OrderType char(1) NOT NULL ,INDEX InMemorySalesOrder_SalesOrderDate NONCLUSTERED HASH (SalesOrderDate) WITH (BUCKET_COUNT = 365) ,INDEX InMemorySalesOrder_SalesOrderDateRange NONCLUSTERED (SalesOrderDate) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY); SET NOCOUNT ON; -- Load InMemorySalesOrder DECLARE @I int = 0; WHILE @I < 1000000 BEGIN SET @I += 1; INSERT INTO InMemorySalesOrder VALUES (@I,@I%20, @I%376, @I%100,DATEADD(DD,@I%365,'2012-12-31'), CASE WHEN @I%3 = 0 THEN 'A' WHEN @I%3 = 1 THEN 'B' WHEN @I%3 = 2 THEN 'C' END); END
If you happen to run this code, pay attention to how lightning fast it runs to populate a million rows into my In_Memory table.
Now that my In_Memory table has been rebuilt with a RANGE index on the same column as my HASH index I can test to see how my In_Memory table now performs as compared to the my disk based table. If I run re-run Query Set 2 in a new query window I get the following message results:
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'SalesOrder'. Scan count 1, logical reads 10, physical reads 1, read-ahead reads 13, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 44 ms. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 15 ms, elapsed time = 10 ms.
Now by reviewing the statistics we can see the In_Memory consumes much less resources then the first time I ran this test. By reviewing the execution plan I can verified that this second run performed an index seek operation on the RANGE index named “InMemorySalesOrder_SalesOrderDateRange”.
It will be important to know what kind of queries you are running against those tables you want to create in memory. Once you know the types of queries then you can determine if you should create HASH or RANGE indexes on those columns. Remember if you need a new index on your In-Memory table you will need to drop and re-create your In_Memory table. Therefore some pre-index planning should be taken on prior to creating an In-Memory table.
Don’t Forget Those RANGE Indexes for Inequality Searches
You will need to be careful when creating In-Memory tables to make sure they include the right types of indexes to support the queries you will be running against your In-Memory tables. If you plan to bring back specific values from In_Memory tables then HASH indexes support those types of queries. But if you need to bring back ranges of values based off inequality operators then don’t forget to create a RANGE index on your data as well. If you haven’t yet starting looking at the new features in SQL Server 2014 then you should consider downloading CTP2 and starting to play with new bits, and exploring features like In_Memory OLTP tables.