SHARE

Real-time Operational Analytics in SQL Server 2016 – Part 3

Written By

Oct 24, 2016

6 minute read

Introduction

SQL Server 2016 introduces the Real-time Operational Analytics feature, which allows you to run both transactional/operational and analytics workloads on the same system – without having a need for another analytics system for reporting.

B-tree Index

B-tree Clustered Index

This means, non-clustered columnstore index exists on slowly changing data and makes a good case for application like order management system. For example, when an order is placed, it moves through several phases (hot data) and once it is delivered it is marked as received (once the order has been received, it rarely changes after that and therefore can be considered warm data).

Now suppose if you want to run an analytic query for received order only, it will access only warm data from the non-clustered columnstore index but if you run analytic query for received and shipped orders, the analytics query transparently accesses both warm and hot data as needed to provide real-time analytics.

--Create the non-clustered columnstore index with a filtered   condition
  CREATE NONCLUSTERED COLUMNSTORE INDEX   
[NCCS_OrderHeaderWithRowStoreClusteredIndexAndNonClusteredColumnStoreIndex_All]
ON   OrderHeaderWithRowStoreClusteredIndexAndNonClusteredColumnStoreIndex
(AccountKey, CustomerName, PurchasePrice, OrderStatus)
WHERE OrderStatus = 5 -- 5 => 'Order Received'

SQL Server optimizer will consider, but not always choose, the non-clustered columnstore index for the analytical query execution. When the query optimizer chooses the filtered non-clustered columnstore index, it transparently combines the rows both from the columnstore index as well as the rows that do not meet the filtered condition to allow real-time analytics. This is another improvement and different from a regular filtered index, which can be used only in queries that restrict themselves to the rows present in the filtered index.

Here are summary, considerations and best practices for improving performance using a Filtered Index:

To minimize the overhead of maintaining a non-clustered columnstore index for operational workload, you can use a filtered condition to create a non-clustered columnstore index only on the warm or slowly changing data. Filtered columnstore index is only supported on disk-based tables. It is not supported on memory-optimized tables.
Have a clustered B-Tree index on the column(s) used in the filtered condition to identify the ‘hot’ rows efficiently.
The idea behind having clustered B-Tree index and filtered non-clustered columnstore index is to access frequently changing “hot” data from the B-Tree index, and to access the more stable “warm” or “cold” data from the filtered non-clustered columnstore index.

Usage of Compression Delay

In situations where you cannot classify data based on a filter condition to provide a time duration on how long the data is going to be hot, SQL Server 2016 provides an option to classify it (warm and hot data) based on time using a setting called Compression Delay. Rows are placed in the delta store/row group initially and moved to columnstore indexes (in compressed form) after the compression delay elapses.

B-tree Clustered Index

In the figure above, the hot data is present in the delta rowgroups until the compression delay time elapses, at which point it will be converted to a columnstore compressed format. You can set the compression delay up to as far as seven days (the valid range for disk-based table is between (0, 10080) minutes and for memory-optimized table is zero or between (60, 10080) minutes) to stay in the delta stores/rowgroups. This gives you flexibility to have an exponentially increasing delta rowgroups to accommodate all the DML activity for the rows during the compression delay.

If you think of this from a different perspective, Compression Delay helps in scenarios where you don’t have a filter predicate (to differentiate between warm and hot data) as well as it also helps to minimize columnstore fragmentation (which mostly happens as part of a delete or update on columnstore compressed data). If you have large fragmentation, it causes inefficient utilization of memory/storage, negatively impacting performance of the analytical queries as it needs to filter the deleted rows from the result set. Hence with Compression Delay you can control the compression for data between slowly changing or frequently changing data.

Memory-optimized Tables

SQL Server 2016 combines memory-optimized table for extreme OLTP and in-memory analytics using columnstore to deliver real-time operational analytics, without changes to your operational workload. If you want to use real-time operational analytics on the in-memory table, you can go ahead and create a columnstore index on top of your in-memory table, which has either Hash index or a Range index or both.

As you can see in the figure below, a columnstore index is created, shown at the bottom along with Deleted Row Table (you can think of Deleted Rows Table – DRT as Delete bitmap, which keeps track of all the deleted rows in the columnstore index on the table).

All the recently changed data due to the recent DML activity is written to the Tail of the in-memory table (you can think of Tail as Delta store, which contains all the rows in the in-memory table which are not yet compressed into columnstore format).

Like non-clustered columnstore index, when the number of rows in the Tail of the in-memory table exceeds 1 million, a background thread compresses rows in the unit of 1 million row group. Also an analytics query will combine rows both from the columnstore index and the Tail automatically without requiring any changes to the query.

Hash Index

Doing real-time operation analytics on an in-memory table has a few differences though when compared to doing it on traditional disk based tables:

You need to include all the columns of the memory optimized table in the columnstore index and hence it is referred to as a clustered columnstore index in the declaration (you can think of this as a special type of non-clustered columnstore index).
Columnstore index is fully memory resident, which consumes more memory though the size requirement for storing columnstore index should be much smaller because of the aggressive compression it uses. SQL 2016 also increases the size limit for in-memory OLTP up to 2TB.
As you can see in the figure above, there is no explicit delta store. The new rows are only inserted into the memory-optimized table. The rows that are in the memory-optimized table but still not in columnstore are referred to as the Tail or “virtual delta store”.

You can reference the script below for creating a memory optimized table with columnstore index. As you can see, the columnstore index is clustered and hence we don’t need to specify individual columns as all the columns of the table are already included.

-- Create a memory-optimized table with a columnstore index
  CREATE TABLE SalesOrderDetailWithMemoryOptimizedAndClusteredColumnStoreIndex(
       [SalesOrderID] [int]   NOT NULL ,
       [SalesOrderDetailID] [int]  NOT NULL PRIMARY KEY NONCLUSTERED,
       [CarrierTrackingNumber] [nvarchar](25) NULL,
       [OrderQty] [smallint]   NOT NULL,
       [ProductID] [int] NOT NULL,
       [SpecialOfferID] [int]   NOT NULL,
       [UnitPrice] [money]   NOT NULL,
       [UnitPriceDiscount] [money]   NOT NULL,
       [LineTotal] [money],
       [ModifiedDate] [datetime]   NOT NULL,
       INDEX SalesOrderDetail_CCI CLUSTERED COLUMNSTORE
  ) WITH (MEMORY_OPTIMIZED = ON );

As discussed above, compression delay works for memory optimized tables as well; in that case it determines the size of the tail.

Real-time Operational Analytics on Secondary Replica

Usage of a filtered non-clustered columnstore index can minimize the columnstore index maintenance but execution of analytics queries still require significant computing resources (CPU, IO, memory) which might impact the performance of the operational workload. In that case, you can offload execution of your analytical queries on your readable secondary replica of the AlwaysOn configuration (AlwaysOn configuration is recommended for mission critical workloads). In this configuration, you can eliminate the impact of running analytics by offloading it to one or more readable secondary replicas as you can see in the figure below.

Readable Secondary Replicas

In this configuration, the only analytics overhead at primary replica is the additional non-clustered columnstore index, executions of analytical queries are offloaded to the secondary replica in the AlwaysOn configuration.

Conclusion

SQL Server 2016 introduces Real-time Operational Analytics, the ability to run both operational/transactional and analytics workloads on the same database tables at the same time.

In this article we learned how to create a filtered non-clustered columnstore index to minimize the overhead of maintaining it, and saw the usage of this feature on memory-optimized tables.

Resources

Real-time Operational Analytics in SQL Server 2016 – Part 1

Real-time Operational Analytics in SQL Server 2016 – Part 2

Get started with Columnstore for real time operational analytics

Understanding New Column Store Index of SQL Server 2012

New Enhanced Column Store Index in SQL Server 2014

See all articles by Arshad Ali

Arshad Ali

Arshad Ali works with Microsoft India R&D Pvt Ltd. He has 8+ years of experience, mostly on Microsoft Technologies. Most recently, as a SQL Developer and BI Developer he has been working on a Data Warehousing project. Arshad is an MCSD, MCITP: Business Intelligence, MCITP: Database Developer 2008 and MCITP: Database Administrator 2008 certified and has presented at several technical events including SQL-School. On an educational front, he has an MCA (Master in Computer Applications) and MBA-IT. Disclaimer : I work for Microsoft and help people and businesses make better use of technology to realize their full potential. The opinions mentioned herein are solely mine and do not reflect those of my current employer or previous employers.