Introduction
Column Store Index is a new feature in SQL Server 2012 that improves performance of data warehouse queries several folds. Unlike regular indexes or heaps, which store data in B-Tree structure (in row-wise fashion), the column store index stores data in columnar fashion and uses compression aggressively to reduce the disk I/O needed to serve the query request along with the newly introduced batch mode processing.
In my last article I talked in detail about the new Column Store Index, how it differs from regular indexes, and why and how it improves the performance of the same query by several folds if column store index is used, where it can be used and what its limitations are.
In this article I am going to take this discussion to another level and show how you can create column store index, how you can use index query hint to include or exclude a column store index, how the performance differs when using column store index vs. row store index and more.
Getting Started with Column Store Index in SQL Server 2012
As discussed in my last article, a column store index can be created as a non-clustered index only and only one column store index can be created on a table, though we can include all the columns of the table (except those which are not supported as part of column store index; please refer my last article to get more details on it) as part or column store index definition. The basic syntax for creating column store index is provided below:
CREATE NONCLUSTERED COLUMNSTORE INDEX <IndexName> ON <TableName> (<Column1>, <Column2>, <Column3>, ... <Column N) ON <FilegroupName>
Let me show an example of creating column store index. For that, I want to first create two empty tables, which will mimic the schema/structure of the Sales.SalesOrderDetail table of the AdventureWorks database.
CREATE TABLE SalesOrderDetailWithRegularIndex( [SalesOrderID] [int] NOT NULL, [SalesOrderDetailID] [int] IDENTITY(1,1) NOT NULL, [CarrierTrackingNumber] [nvarchar](25) NULL, [OrderQty] [smallint] NOT NULL, [ProductID] [int] NOT NULL, [SpecialOfferID] [int] NOT NULL, [UnitPrice] [money] NOT NULL, [UnitPriceDiscount] [money] NOT NULL, [LineTotal] [money], [rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL, [ModifiedDate] [datetime] NOT NULL, CONSTRAINT [PK_SalesOrderDetailWithRegularIndex_SalesOrderID_SalesOrderDetailID] PRIMARY KEY CLUSTERED ( [SalesOrderID] ASC, [SalesOrderDetailID] ASC )) ON [PRIMARY] GO CREATE TABLE SalesOrderDetailWithColumnStoreIndex( [SalesOrderID] [int] NOT NULL, [SalesOrderDetailID] [int] IDENTITY(1,1) NOT NULL, [CarrierTrackingNumber] [nvarchar](25) NULL, [OrderQty] [smallint] NOT NULL, [ProductID] [int] NOT NULL, [SpecialOfferID] [int] NOT NULL, [UnitPrice] [money] NOT NULL, [UnitPriceDiscount] [money] NOT NULL, [LineTotal] [money], [rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL, [ModifiedDate] [datetime] NOT NULL, CONSTRAINT [PK_SalesOrderDetailWithColumnStoreIndex_SalesOrderID_SalesOrderDetailID] PRIMARY KEY CLUSTERED ( [SalesOrderID] ASC, [SalesOrderDetailID] ASC )) ON [PRIMARY] GO
Next I want to load a good amount of data (20 times data of AdventureWorks.Sales.SalesOrderDetail, which is close to 2.4 millions) into these tables so that performance differences will be evident:
INSERT INTO SalesOrderDetailWithRegularIndex (SalesOrderID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, rowguid, ModifiedDate) SELECT SalesOrderID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, rowguid, ModifiedDate FROM [AdventureWorks2012].Sales.[SalesOrderDetail] GO 20 INSERT INTO SalesOrderDetailWithColumnStoreIndex (SalesOrderID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, rowguid, ModifiedDate) SELECT SalesOrderID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, rowguid, ModifiedDate FROM [AdventureWorks2012].Sales.[SalesOrderDetail] GO 20
SELECT Count(*) FROM [AdventureWorks2012].Sales.[SalesOrderDetail] SELECT Count(*) FROM SalesOrderDetailWithRegularIndex SELECT Count(*) FROM SalesOrderDetailWithColumnStoreIndex
And finally I want to create a regular non-cluster index (on ProductId and LineTotal columns) on the first table, and column store index on the second table, which will include ProductId and LineTotal columns.
CREATE NONCLUSTERED INDEX [XI_SalesOrderDetailWithRegularIndex_ProductID_LineTotal] ON SalesOrderDetailWithRegularIndex (ProductID, LineTotal) CREATE NONCLUSTERED COLUMNSTORE INDEX [XI_SalesOrderDetailWithColumnStoreIndex_ProductID_LineTotal] ON SalesOrderDetailWithColumnStoreIndex (ProductID, LineTotal)
Not only you can create column store index using T-SQL script, but you can also take advantage of the wizard available in SSMS (SQL Server Management Studio) to create column store index; Right click on Indexes node under the desired table and then click on New Index -> Non-Clustered Columnstore Index as shown below:
Click on New Index -> Non-Clustered Columnstore Index
You can also modify the existing column store index using the wizard; just double click on the index name and you will see the Index Properties dialog box, as shown below:
Index Properties dialog box
Using the wizard is quite helpful when you have a large number of columns in the table. You can exclude or include columns using the wizard, and when you click on OK the column store index will be re-created.
Select a column
Analyzing the Performance Benefits of Using Column Store Index
As I said in my last article, column store index has been designed to substantially accelerate common data warehouse queries, which require scanning, aggregation and filtering of large amounts of data or joining multiple tables like a star schema. With column store index, you can get almost interactive response time for queries against billions of rows on an economical SMP server with enough RAM to hold your frequently accessed data. So let’s see running these kinds of queries against both tables (one that doesn’t have column store index and one that has column store index) that we created earlier.
SELECT ProductID, SUM(LineTotal) AS 'ProductWiseSale' FROM SalesOrderDetailWithRegularIndex GROUP BY ProductID ORDER BY ProductID SELECT ProductID, SUM(LineTotal) AS 'ProductWiseSale' FROM SalesOrderDetailWithColumnStoreIndex GROUP BY ProductID ORDER BY ProductID
And wow! The relative cost of the second query (which uses column store index) is just 7% as opposed to the relative cost of first query (which uses regular index) which is 93%.
The relative cost of the second query
For column store index exclusively, SQL Server 2012 introduces a new execution mode called Batch Mode, which processes batches of rows (as opposed to the row by row processing in case of regular index) that is optimized for multicore CPUs and increased memory throughput of modern hardware architecture. It also introduced a new operator for column store index processing as shown below:
Columnstore Index Scan
When I ran the query with STATISTICS IO ON, I found stunning results (with significant performance) of using column store index vs regular index, as you can see below:
--Clear the procedure cache and buffer cache --CAUTION - Please don't run these clean script on live/production system DBCC DROPCLEANBUFFERS DBCC FREEPROCCACHE SET STATISTICS IO ON SELECT ProductID, SUM(LineTotal) AS 'ProductWiseSale' FROM SalesOrderDetailWithRegularIndex GROUP BY ProductID ORDER BY ProductID SELECT ProductID, SUM(LineTotal) AS 'ProductWiseSale' FROM SalesOrderDetailWithColumnStoreIndex GROUP BY ProductID ORDER BY ProductID SET STATISTICS IO OFF (266 row(s) affected) Table 'SalesOrderDetailWithRegularIndex'. Scan count 9, logical reads 7945, physical reads 1, read-ahead reads 7879, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. (266 row(s) affected) Table 'SalesOrderDetailWithColumnStoreIndex'. Scan count 8, logical reads 141, physical reads 2, read-ahead reads 179, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Even the time required to run these two queries greatly varied, the queries with regular index took 625 ms for CPU cycle and 2238 ms as elapsed time vs just 46 ms for CPU cycle and 57 ms as elapsed time for the second query, which uses column store index:
--Clear the procedure cache and buffer cache --CAUTION - Please don't run these clean script on live/production system DBCC DROPCLEANBUFFERS DBCC FREEPROCCACHE SET STATISTICS TIME ON SELECT ProductID, SUM(LineTotal) AS 'ProductWiseSale' FROM SalesOrderDetailWithRegularIndex GROUP BY ProductID ORDER BY ProductID SELECT ProductID, SUM(LineTotal) AS 'ProductWiseSale' FROM SalesOrderDetailWithColumnStoreIndex GROUP BY ProductID ORDER BY ProductID SET STATISTICS TIME OFF (266 row(s) affected) SQL Server Execution Times: CPU time = 625 ms, elapsed time = 2238 ms. (266 row(s) affected) SQL Server Execution Times: CPU time = 46 ms, elapsed time = 57 ms.
Sometimes when you have column store index on a table, but for some reason you don’t want that index to be used as part of your query execution, you can use the “OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)” clause to exclude column store index as part of execution plan:
SELECT ProductID, SUM(LineTotal) AS 'ProductWiseSale' FROM SalesOrderDetailWithColumnStoreIndex GROUP BY ProductID ORDER BY ProductID OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)
On a different note, as per a performance improvement study done by Microsoft on a 32-logical processor machine with 256GB of RAM on a table with 1 TB of data and 1.44 billion rows and stunning result published in this white paper; the queries gained a 16X speed-up in CPU time and a whopping 455X improvement in elapsed time. In real terms it means a query that took 501 seconds originally (when using no column store index) was reduced to merely 1.1 seconds (when using column store index).
Conclusion
Column Store Index is new feature in SQL Server 2012 that improves performance of data warehouse queries several folds. In this article series I talked in detail about the new Column Store Index, how it differs from regular indexes, why and how it improves the performance of the same query by several folds if column store index is used, where it can be used and what its limitations are. Then I demonstrated usage/creation of column store index, how you can use index query hint to include or exclude a column store index, how the performance differs when using column store index vs. row store index.
Hope you enjoy reading it and you use this great feature in your projects!
Resources
Understanding New Column Store Index of SQL Server 2012
SQL Server Columnstore Index FAQ
Column Store Index White paper