Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Slideshows
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted September 19, 2016

Real-time Operational Analytics in SQL Server 2016 - Part 2

By Arshad Ali

Introduction

SQL Server 2016 introduces the Real-time Operational Analytics feature, which allows you to run both transactional/operational and analytics workloads on the same system – without having the need for another analytics system for reporting.

In the last article of the series, we looked at traditional analytics architecture, challenges it faces, how the newly introduced Real-time Operational Analytics feature overcomes those challenges. In this article, we will look into different design considerations for leveraging the Real-time Operational Analytics feature, will look into different scenarios where it fits or does not fit, and then we will look at how to get started with it for disk based tables.

Real-time Operational Analytics - Points to Note

Real-time Operational Analytics feature is not a solution for all the problems and there are a few things to note about this new feature:

  • The schema for operational databases is highly normalized (i.e. with minimal data duplication), which when used for analytic queries could lead to poor performance primarily because of the complexity of joins between  a larger number of tables. The significant query speed up made possible by the Columnstore index (i.e. non-clustered columnstore index) can overcome the complexity of a query and still deliver most analytics queries in a few seconds. At this point, it is important to emphasize the analytics query performance with real-time operational analytics will not be as fast as you can get with a dedicated analytics store but the key benefit here is the ability to do analytics in real-time, without actually moving data across.
  • Real-time operational analytics is not a replacement of analytics store where you have data coming from non-relational data sources or coming from multiple sources as shown in the figure below. Also it might not be able to help you when you need extreme analytics queries performance, which is possible using customized schemas (e.g. Star/Snowflake) and pre-aggregated cubes.

Analytics Architecture Multiple Sources
Analytics Architecture Multiple Sources

As noted above, Real-time Operational Analytics targets the scenario of a single data source such as an Enterprise Resource Planning (ERP) application on which you can run both the operational and the analytics workload side by side without having much impact on others.

To summarize, the real-time operational analytics feature supports creating updateable non-clustered columnstore index (NCCI) on traditional rowstore tables, which means transactional workload runs on rowstore indexes whereas analytical workload runs on columnstore indexes side-by-side.

Real-time Operational Analytics - Considerations and Setting It Up

There are certain pre-requisites and limitations with usage of the real-time operational analytics feature; here are few of the important ones:

  • Identify operational table(s) and columns you need to run analytics on – It could be either a traditional disk-based table or memory-optimized table, or both.
  • Drop existing row-store indexes created to cater to the need for analytics – For each table, drop all rowstore B-Tree indexes that are primarily designed to speed up existing analytics on your OLTP workload. Replace them with a single non-clustered columnstore index, as discussed next. This can improve the overall performance of your OLTP workload since there will be fewer indexes to maintain.
  • Create columnstore index on tables identified - Note, starting with SQL Server 2016, non-clustered columnstore index is now updateable, which means operational workload continues to work as before without requiring any application changes.
  • Set up analytics reporting to get data directly from the operational store - SQL Server query optimizer will automatically choose non-clustered columnstore index as needed for analytics queries.
  • Please note, a filtered non-clustered Columnstore index (discussed in the next article of the series) is only supported on disk-based tables. It is not supported on memory-optimized tables.

Getting Started with Real-time Operational Analytics

SQL Server 2016 supports real-time operational analytics for both traditional disk-based and memory-optimized tables without requiring any changes to your transactional workload. For disk-based tables, you can do it by creating a non-clustered Columnstore index (NCCI), which is what I am going to demonstrate in this table. I will talk about using real-time operational analytics on memory-optimized tables in the next article.

Disk Based Tables

In this demonstration, I am going to create two tables with these index specifications with same number of records to demonstrate different performance behaviors with different types of indexes:

  • Table with clustered index (row-store) and non-clustered index (row-store)
  • Table with clustered index (row-store) and non-clustered columnstore index

The script below creates two tables with no index (heap).

--Table with Row Store index clustered index and Row Store   non-clustered index
CREATE TABLE SalesOrderDetailWithRowStoreClusteredIndexAndRowStoreNonClusteredIndex(
       [SalesOrderID] [int]   NOT NULL,
       [SalesOrderDetailID] [int]   IDENTITY(1,1) NOT NULL,
       [CarrierTrackingNumber] [nvarchar](25) NULL,
       [OrderQty] [smallint]   NOT NULL,
       [ProductID] [int] NOT NULL,
       [SpecialOfferID] [int]   NOT NULL,
       [UnitPrice] [money]   NOT NULL,
       [UnitPriceDiscount] [money]   NOT NULL,
       [LineTotal] [money],
       [rowguid] [uniqueidentifier]   ROWGUIDCOL  NOT   NULL,
       [ModifiedDate] [datetime]   NOT NULL
           ) ON [PRIMARY]
GO
 
--Table with Row Store clustered index and non-clustered Column   Store index
CREATE TABLE   SalesOrderDetailWithRowStoreClusteredIndexAndNonClusteredColumnStoreIndex(
       [SalesOrderID] [int]   NOT NULL,
       [SalesOrderDetailID] [int]   IDENTITY(1,1) NOT NULL,
       [CarrierTrackingNumber] [nvarchar](25) NULL,
       [OrderQty] [smallint]   NOT NULL,
       [ProductID] [int] NOT NULL,
       [SpecialOfferID] [int]   NOT NULL,
       [UnitPrice] [money]   NOT NULL,
       [UnitPriceDiscount] [money]   NOT NULL,
       [LineTotal] [money],
       [rowguid] [uniqueidentifier]   ROWGUIDCOL  NOT   NULL,
       [ModifiedDate] [datetime]   NOT NULL
           ) ON [PRIMARY]
GO

The script below loads close to 10 million rows into the two tables created above, with no index (heap) from the AdventureWorks2014.Sales.SalesOrderDetail. You can download the AdventureWorks2014 database from here.

--Load data to these tables
INSERT INTO   SalesOrderDetailWithRowStoreClusteredIndexAndRowStoreNonClusteredIndex (SalesOrderID,   CarrierTrackingNumber,   
OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, rowguid, ModifiedDate)
SELECT SalesOrderID, CarrierTrackingNumber,   OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, 
LineTotal, rowguid, ModifiedDate FROM [AdventureWorks2014].Sales.[SalesOrderDetail]
GO 85
INSERT INTO SalesOrderDetailWithRowStoreClusteredIndexAndNonClusteredColumnStoreIndex (SalesOrderID,   CarrierTrackingNumber,   
OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal, rowguid, ModifiedDate)
SELECT SalesOrderID, CarrierTrackingNumber,   OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, 
LineTotal, rowguid, ModifiedDate FROM [AdventureWorks2014].Sales.[SalesOrderDetail]
GO 85

Now let me create a clustered index (row-store) and one non-clustered index (row-store) on the first table.

--Creating Indexes - Row Store index clustered index and Row   Store non-clustered index
CREATE CLUSTERED INDEX   [CIRS_SalesOrderDetailWithRowStoreClusteredIndexAndRowStoreNonClusteredIndex_SalesOrderID_SalesOrderDetailID]
ON   SalesOrderDetailWithRowStoreClusteredIndexAndRowStoreNonClusteredIndex
(SalesOrderID, SalesOrderDetailID)
GO
CREATE NONCLUSTERED INDEX   [NCRS_SalesOrderDetailWithRowStoreClusteredIndexAndRowStoreNonClusteredIndex_All]
ON SalesOrderDetailWithRowStoreClusteredIndexAndRowStoreNonClusteredIndex
(SalesOrderID, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal)
GO

Next, I create a clustered index (row-store) and a non-clustered columnstore index on the second table.

--Creating Indexes - Row Store clustered index and non-clustered   Column Store index
CREATE CLUSTERED INDEX   [CIRS_SalesOrderDetailWithRowStoreClusteredIndexAndNonClusteredColumnStoreIndex_SalesOrderID_SalesOrderDetailID]
ON SalesOrderDetailWithRowStoreClusteredIndexAndNonClusteredColumnStoreIndex
(SalesOrderID, SalesOrderDetailID)
 GO
CREATE NONCLUSTERED COLUMNSTORE INDEX   [NCCS_SalesOrderDetailWithRowStoreClusteredIndexAndNonClusteredColumnStoreIndex_All]
ON   SalesOrderDetailWithRowStoreClusteredIndexAndNonClusteredColumnStoreIndex
(SalesOrderID, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, LineTotal)
GO
--Creating Non-Clustered Columnstore Index (NCCI) is just a DDL   operation and similar to any other 
--btree index that you would create on a rowstore table. No   changes to the application needed 
--because the NCCI is updateable in SQL Server 2016 so your transaction   workload will continue to run.

I have loaded same number of rows (~10 million) in each of these tables, and this is how you can verify it:

SELECT Count(*) FROM   SalesOrderDetailWithRowStoreClusteredIndexAndRowStoreNonClusteredIndex
SELECT Count(*) FROM   SalesOrderDetailWithRowStoreClusteredIndexAndNonClusteredColumnStoreIndex
GO

Verifying the Number of Rows
Verifying the Number of Rows

Now we can run an aggregation query on the columns included in the index as shown below:

SELECT ProductID, SUM(OrderQty) AS 'QuatityOrdered', SUM(LineTotal) AS 'ProductWiseSale' FROM 
SalesOrderDetailWithRowStoreClusteredIndexAndRowStoreNonClusteredIndex
GROUP BY ProductID
ORDER BY ProductID
 
SELECT ProductID, SUM(OrderQty) AS 'QuatityOrdered', SUM(LineTotal) AS 'ProductWiseSale' FROM 
SalesOrderDetailWithRowStoreClusteredIndexAndNonClusteredColumnStoreIndex
GROUP BY ProductID
ORDER BY ProductID

If we look at the execution plan of the above two queries, the first query (which involves a table with clustered index (row-store) and non-clustered index (row-store)) has 96% relative cost in the batch with respect to the second query (which involves a table with clustered index (row-store) and a non-clustered columnstore index) which has 4% relative cost in the batch. You can also notice, the second query runs significantly faster because it makes use of non-clustered columnstore index (which is update-able now in SQL Server 2016 and can be created on any table without making it read-only like previous versions of SQL Server) whereas first query makes use of a non-clustered index and runs slower (not that optimal for analytical queries, as you can guess):

Execution Plan
Execution Plan

When there is a non-clustered columnstore index, query optimizer picks it up for analytics queries when it finds it suitable and provides significant speed up over traditional B-Tree indexes. Now if we look at the IO (Input\Output) statistics of the above two queries this is how it looks:

(266 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderDetailWithRowStoreClusteredIndexAndRowStoreNonClusteredIndex'. Scan count 5, logical reads 64233, physical reads 2, read-ahead reads 62935, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
(266 row(s) affected)
Table 'SalesOrderDetailWithRowStoreClusteredIndexAndNonClusteredColumnStoreIndex'. Scan count 8, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 11116, lob physical reads 38, lob read-ahead reads 41988.
Table 'SalesOrderDetailWithRowStoreClusteredIndexAndNonClusteredColumnStoreIndex'. Segment reads 11, segment skipped 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

If we look at the CPU time statistics of the above two queries this is how it looks:

(266 row(s) affected)
 
 SQL Server Execution Times:
   CPU time = 4688 ms,  elapsed time = 3743 ms.
 
(266 row(s) affected)
 
 SQL Server Execution Times:
   CPU time = 156 ms,  elapsed time = 210 ms.

The conclusion here is the second query, which uses non-clustered columnstore index runs several times faster than the first query, which uses only row-store (B-Tree) indexes, making it possible to run analytical queries side by side without having much impact on the operational system.

For columnstore index (whether clustered or non-clustered), data gets stored in row groups and then to column segments; you can use the queries below to find out information about them along with the number of rows\values stored and disk size:

SELECT object_name(object_id) as ObjectName, * FROM sys.column_store_row_groups
ORDER BY object_id, row_group_id
 
SELECT object_name(p.object_id) AS ObjectName, C.column_id, C.segment_id , C.partition_id, sum(C.on_disk_size) AS on_disk_size, SUM(C.row_count) AS row_count
FROM sys.column_store_segments   C
INNER JOIN sys.partitions p
ON C.partition_id=p.partition_id
GROUP BY P.object_id, C.partition_id, C.column_id, C.segment_id 
ORDER BY P.object_id, C.partition_id, C.column_id, C.segment_id

Conclusion

SQL Server 2016 introduces Real-time Operational Analytics, the ability to run both operational\transactional and analytics workloads on the same database tables at the same time.

In this article we looked into different design considerations for leveraging the Real-time Operational Analytics feature, looked into different scenarios where it fits or where it does not fit, and then finally we got started with it for disk based tables.

In the next article of the series, we will learn about creating a filtered non-clustered columnstore index to minimize the overhead of maintaining it and will learn about making use of this feature on memory-optimized tables.

Resources

Real-time Operational Analytics in SQL Server 2016 - Part 1

Get started with Columnstore for real time operational analytics

Understanding New Column Store Index of SQL Server 2012

New Enhanced Column Store Index in SQL Server 2014

What's New in SQL Server 2016

See all articles by Arshad Ali



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM