dcsimg
Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Tips Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
SQL Scripts & Samples
Tips
» Database Forum
» Slideshows
Free Newsletters:



News Via RSS Feed


Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted February 3, 2020

The New SQL Server Approximate Count Distinct Function

By Greg Larsen

With the introduction of SQL Server 2019, Microsoft introduced a new method to count distinct values. This new method of counting is to use the new function called APPROX_COUNT_DISTINCT(). As the name suggests, this new function doesn’t return the actual count of distinct values, but instead returns an approximate count of the distinct values. By using this new function, you might find your big analytic queries, that count distinct values, will run faster and use less resources.

This function was introduced to solve the memory issues associated with counting distinct values where a large number of distinct values exists. When a large number of distinct values exist, SQL Server at some point is no longer be able to maintain counting distinct values in memory alone. When distinct values can’t be maintained in memory, the database engine need to spill to tempdb. The spilling to tempdb is a costly operations, and the therefore slows down the counting process.

The implementation of APPROX_COUNT_DISTINCT() has a much smaller memory footprint than the tried and true COUNT(DISTINCT) function. Per documentation this new function can estimate the number of distinct values of greater than 1,000,000,000 where the accuracy of the calculated approximate distinct count value is within 2% of the actual distinct count value. And it does this using less than 1.5 KB of memory. If your business need doesn’t require an accurate count value, and is willing to live with little less accuracy provide your distinct count query runs faster, than you might want to check out this new function. Here is an example of using this new function to get an approximate counts:

SELECT APPROX_COUNT_DISTINCT(Locations) AS NumOfLocations, 
WHERE DATEPART (month,BillingDate) IN (1,2,3) 
GROUP BY DATEPART (month,BillingDate);
       DATEPART (month,BillingDate) AS BillingMonth
FROM [dbo].[BillingInfo] 

# # #

» See All Articles by Columnist Gregory A. Larsen



MS SQL Archives




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM