SHARE

Getting Started with the New Column Store Index of SQL Server 2012

Written By

Sep 13, 2012

7 minute read

Introduction

Column Store Index is a new feature in SQL Server 2012 that improves performance of data warehouse queries several folds. Unlike regular indexes or heaps, which store data in B-Tree structure (in row-wise fashion), the column store index stores data in columnar fashion and uses compression aggressively to reduce the disk I/O needed to serve the query request along with the newly introduced batch mode processing.

In my last article I talked in detail about the new Column Store Index, how it differs from regular indexes, and why and how it improves the performance of the same query by several folds if column store index is used, where it can be used and what its limitations are.

In this article I am going to take this discussion to another level and show how you can create column store index, how you can use index query hint to include or exclude a column store index, how the performance differs when using column store index vs. row store index and more.

Getting Started with Column Store Index in SQL Server 2012

As discussed in my last article, a column store index can be created as a non-clustered index only and only one column store index can be created on a table, though we can include all the columns of the table (except those which are not supported as part of column store index; please refer my last article to get more details on it) as part or column store index definition. The basic syntax for creating column store index is provided below:

CREATE NONCLUSTERED COLUMNSTORE   INDEX <IndexName> 
ON <TableName> (<Column1>, <Column2>, <Column3>, ... <Column N)
ON <FilegroupName>

Let me show an example of creating column store index. For that, I want to first create two empty tables, which will mimic the schema/structure of the Sales.SalesOrderDetail table of the AdventureWorks database.

CREATE TABLE SalesOrderDetailWithRegularIndex(
       [SalesOrderID] [int]   NOT NULL,
       [SalesOrderDetailID] [int]   IDENTITY(1,1) NOT NULL,
       [CarrierTrackingNumber] [nvarchar](25) NULL,
       [OrderQty] [smallint]   NOT NULL,
       [ProductID] [int] NOT NULL,
       [SpecialOfferID] [int]   NOT NULL,
       [UnitPrice] [money]   NOT NULL,
       [UnitPriceDiscount] [money]   NOT NULL,
       [LineTotal] [money],
       [rowguid] [uniqueidentifier]   ROWGUIDCOL  NOT    NULL,
       [ModifiedDate] [datetime]   NOT NULL,
 CONSTRAINT [PK_SalesOrderDetailWithRegularIndex_SalesOrderID_SalesOrderDetailID]   
PRIMARY KEY CLUSTERED 
(
       [SalesOrderID] ASC,
       [SalesOrderDetailID] ASC
)) ON [PRIMARY]
GO
 
CREATE TABLE SalesOrderDetailWithColumnStoreIndex(
       [SalesOrderID] [int]   NOT NULL,
       [SalesOrderDetailID] [int]   IDENTITY(1,1) NOT NULL,
       [CarrierTrackingNumber] [nvarchar](25) NULL,
       [OrderQty] [smallint]   NOT NULL,
       [ProductID] [int] NOT NULL,
       [SpecialOfferID] [int]   NOT NULL,
       [UnitPrice] [money]   NOT NULL,
       [UnitPriceDiscount] [money]   NOT NULL,
       [LineTotal] [money],
       [rowguid] [uniqueidentifier]   ROWGUIDCOL  NOT   NULL,
       [ModifiedDate] [datetime]   NOT NULL,
 CONSTRAINT [PK_SalesOrderDetailWithColumnStoreIndex_SalesOrderID_SalesOrderDetailID]   
PRIMARY KEY CLUSTERED 
(
       [SalesOrderID] ASC,
       [SalesOrderDetailID] ASC
)) ON [PRIMARY]
GO

Next I want to load a good amount of data (20 times data of AdventureWorks.Sales.SalesOrderDetail, which is close to 2.4 millions) into these tables so that performance differences will be evident:

INSERT INTO SalesOrderDetailWithRegularIndex (SalesOrderID, CarrierTrackingNumber,   
OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, rowguid, 
ModifiedDate)
SELECT SalesOrderID, CarrierTrackingNumber,   OrderQty, ProductID, 
SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, rowguid, ModifiedDate
FROM [AdventureWorks2012].Sales.[SalesOrderDetail]
GO 20
 
INSERT INTO SalesOrderDetailWithColumnStoreIndex (SalesOrderID, CarrierTrackingNumber,   
OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, rowguid, 
ModifiedDate)
SELECT SalesOrderID, CarrierTrackingNumber,   OrderQty, ProductID, SpecialOfferID, 
UnitPrice, UnitPriceDiscount, LineTotal, rowguid, ModifiedDate
FROM [AdventureWorks2012].Sales.[SalesOrderDetail]
GO 20

 
SELECT Count(*) FROM [AdventureWorks2012].Sales.[SalesOrderDetail]
SELECT Count(*) FROM SalesOrderDetailWithRegularIndex
SELECT Count(*) FROM SalesOrderDetailWithColumnStoreIndex

And finally I want to create a regular non-cluster index (on ProductId and LineTotal columns) on the first table, and column store index on the second table, which will include ProductId and LineTotal columns.

CREATE NONCLUSTERED   INDEX [XI_SalesOrderDetailWithRegularIndex_ProductID_LineTotal]
ON SalesOrderDetailWithRegularIndex
(ProductID, LineTotal)
 
CREATE NONCLUSTERED   COLUMNSTORE INDEX 
[XI_SalesOrderDetailWithColumnStoreIndex_ProductID_LineTotal]
ON SalesOrderDetailWithColumnStoreIndex
(ProductID, LineTotal)

Not only you can create column store index using T-SQL script, but you can also take advantage of the wizard available in SSMS (SQL Server Management Studio) to create column store index; Right click on Indexes node under the desired table and then click on New Index -> Non-Clustered Columnstore Index as shown below:

Click on New Index -> Non-Clustered Columnstore Index

You can also modify the existing column store index using the wizard; just double click on the index name and you will see the Index Properties dialog box, as shown below:

Index Properties dialog box

Using the wizard is quite helpful when you have a large number of columns in the table. You can exclude or include columns using the wizard, and when you click on OK the column store index will be re-created.

Select a column

Analyzing the Performance Benefits of Using Column Store Index

As I said in my last article, column store index has been designed to substantially accelerate common data warehouse queries, which require scanning, aggregation and filtering of large amounts of data or joining multiple tables like a star schema. With column store index, you can get almost interactive response time for queries against billions of rows on an economical SMP server with enough RAM to hold your frequently accessed data. So let’s see running these kinds of queries against both tables (one that doesn’t have column store index and one that has column store index) that we created earlier.

SELECT ProductID, SUM(LineTotal) AS 'ProductWiseSale'   FROM 
SalesOrderDetailWithRegularIndex
GROUP BY ProductID
ORDER BY ProductID
 
SELECT ProductID, SUM(LineTotal) AS 'ProductWiseSale'   FROM 
SalesOrderDetailWithColumnStoreIndex
GROUP BY ProductID
ORDER BY ProductID

And wow! The relative cost of the second query (which uses column store index) is just 7% as opposed to the relative cost of first query (which uses regular index) which is 93%.

The relative cost of the second query

For column store index exclusively, SQL Server 2012 introduces a new execution mode called Batch Mode, which processes batches of rows (as opposed to the row by row processing in case of regular index) that is optimized for multicore CPUs and increased memory throughput of modern hardware architecture. It also introduced a new operator for column store index processing as shown below:

Columnstore Index Scan

When I ran the query with STATISTICS IO ON, I found stunning results (with significant performance) of using column store index vs regular index, as you can see below:

--Clear the procedure cache and buffer cache
--CAUTION - Please don't run these clean script on   live/production system
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
 
SET STATISTICS IO ON
SELECT ProductID, SUM(LineTotal) AS 'ProductWiseSale'   FROM SalesOrderDetailWithRegularIndex
GROUP BY ProductID
ORDER BY ProductID
 
SELECT ProductID, SUM(LineTotal) AS 'ProductWiseSale'   FROM SalesOrderDetailWithColumnStoreIndex
GROUP BY ProductID
ORDER BY ProductID
SET STATISTICS IO OFF 
 
(266   row(s) affected)
Table   'SalesOrderDetailWithRegularIndex'. Scan count 9, logical reads 7945,   physical reads 1, read-ahead reads 7879, lob logical reads 0, lob physical   reads 0, lob read-ahead reads 0.
 
(266   row(s) affected)
Table   'SalesOrderDetailWithColumnStoreIndex'. Scan count 8, logical reads 141,   physical reads 2, read-ahead reads 179, lob logical reads 0, lob physical   reads 0, lob read-ahead reads 0.
Table   'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead   reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Even the time required to run these two queries greatly varied, the queries with regular index took 625 ms for CPU cycle and 2238 ms as elapsed time vs just 46 ms for CPU cycle and 57 ms as elapsed time for the second query, which uses column store index:

--Clear the procedure cache and buffer cache
--CAUTION - Please don't run these clean script on   live/production system
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
 
SET STATISTICS TIME ON
SELECT ProductID, SUM(LineTotal) AS 'ProductWiseSale'   FROM SalesOrderDetailWithRegularIndex
GROUP BY ProductID
ORDER BY ProductID
 
SELECT ProductID, SUM(LineTotal) AS 'ProductWiseSale'   FROM SalesOrderDetailWithColumnStoreIndex
GROUP BY ProductID
ORDER BY ProductID
SET STATISTICS TIME OFF
 
(266   row(s) affected)
 
 SQL   Server Execution Times:
     CPU time = 625 ms,  elapsed time = 2238 ms.
 
(266   row(s) affected)
 
 SQL   Server Execution Times:
     CPU time = 46 ms,  elapsed time = 57 ms.

Sometimes when you have column store index on a table, but for some reason you don’t want that index to be used as part of your query execution, you can use the “OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)” clause to exclude column store index as part of execution plan:

SELECT ProductID, SUM(LineTotal) AS 'ProductWiseSale'   FROM SalesOrderDetailWithColumnStoreIndex
GROUP BY ProductID
ORDER BY ProductID
OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)

On a different note, as per a performance improvement study done by Microsoft on a 32-logical processor machine with 256GB of RAM on a table with 1 TB of data and 1.44 billion rows and stunning result published in this white paper; the queries gained a 16X speed-up in CPU time and a whopping 455X improvement in elapsed time. In real terms it means a query that took 501 seconds originally (when using no column store index) was reduced to merely 1.1 seconds (when using column store index).

Conclusion

Column Store Index is new feature in SQL Server 2012 that improves performance of data warehouse queries several folds. In this article series I talked in detail about the new Column Store Index, how it differs from regular indexes, why and how it improves the performance of the same query by several folds if column store index is used, where it can be used and what its limitations are. Then I demonstrated usage/creation of column store index, how you can use index query hint to include or exclude a column store index, how the performance differs when using column store index vs. row store index.

Hope you enjoy reading it and you use this great feature in your projects!

Resources

Understanding New Column Store Index of SQL Server 2012

SQL Server Columnstore Index FAQ

Columnstore Indexes

Column Store Index White paper

SQL Server Columnstore Performance Tuning

See all articles by Arshad Ali

Arshad Ali

Arshad Ali works with Microsoft India R&D Pvt Ltd. He has 8+ years of experience, mostly on Microsoft Technologies. Most recently, as a SQL Developer and BI Developer he has been working on a Data Warehousing project. Arshad is an MCSD, MCITP: Business Intelligence, MCITP: Database Developer 2008 and MCITP: Database Administrator 2008 certified and has presented at several technical events including SQL-School. On an educational front, he has an MCA (Master in Computer Applications) and MBA-IT. Disclaimer : I work for Microsoft and help people and businesses make better use of technology to realize their full potential. The opinions mentioned herein are solely mine and do not reflect those of my current employer or previous employers.