Getting Approximate Count in Microsoft SQL Server

With the introduction of SQL Server 2019, there is now a new, faster way to get a list of distinct values in a column. This new way is using the APPROX_COUNT_DISTINCT function. This new function does not return the exact number of distinct values, but instead, as the function name suggests, it only returns an approximate count. It does this by using the HyperLogLog algorithm.

This new function is faster than the COUNT(*) distinct function because it uses less memory. This new function can estimate the number of distinct values greater than 1,000,000,000, while using less than 1.5 KB or memory. Because it uses less memory, less data is spilled to tempdb than the COUNT(*) function. But there is a cost associated using this new function, and that is accuracy.

The APPROX_COUNT_DISTINCT function does not return the actual number of rows with each distinct value, but instead returns an approximate count. The approximate count might be higher or lower than the actual number. According to Microsoft’s documentation, 97% of the time the APPROX_COUNT_DISTINCT function will be within the 2% of the actual value.

If you have a really large table, want a count of distinct value and can live with an approximate count then you might be able to get a performance boost with this new function. Below is an example of how to use this new function:

SELECT APPROX_COUNT_DISTINCT(MyColumn) 
FROM [dbo].[MyTable];
Gregory Larsen
Gregory Larsen
Gregory A. Larsen is a DBA at Washington State Department of Health (DOH). Greg is responsible for maintaining SQL Server and other database management software. Greg works with customers and developers to design and implement database changes, and solve database/application related problems. Greg builds homegrown solutions to simplify and streamline common database management tasks, such as capacity management.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles