With the introduction of SQL Server 2019, Microsoft introduced a new method to count distinct values. This new method of counting is to use the new function called APPROX_COUNT_DISTINCT(). As the name suggests, this new function doesn’t return the actual count of distinct values, but instead returns an approximate count of the distinct values. By using this new function, you might find your big analytic queries, that count distinct values, will run faster and use less resources.
This function was introduced to solve the memory issues associated with counting distinct values where a large number of distinct values exists. When a large number of distinct values exist, SQL Server at some point is no longer be able to maintain counting distinct values in memory alone. When distinct values can’t be maintained in memory, the database engine need to spill to tempdb. The spilling to tempdb is a costly operations, and the therefore slows down the counting process.
The implementation of APPROX_COUNT_DISTINCT() has a much smaller memory footprint than the tried and true COUNT(DISTINCT) function. Per documentation this new function can estimate the number of distinct values of greater than 1,000,000,000 where the accuracy of the calculated approximate distinct count value is within 2% of the actual distinct count value. And it does this using less than 1.5 KB of memory. If your business need doesn’t require an accurate count value, and is willing to live with little less accuracy provide your distinct count query runs faster, than you might want to check out this new function. Here is an example of using this new function to get an approximate counts:
SELECT APPROX_COUNT_DISTINCT(Locations) AS NumOfLocations, WHERE DATEPART (month,BillingDate) IN (1,2,3) GROUP BY DATEPART (month,BillingDate); DATEPART (month,BillingDate) AS BillingMonth FROM [dbo].[BillingInfo]
# # #