With the introduction of SQL Server 2019, there is now a new, faster way to get a list of distinct values in a column. This new way is using the APPROX_COUNT_DISTINCT function. This new function does not return the exact number of distinct values, but instead, as the function name suggests, it only returns an approximate count. It does this by using the HyperLogLog algorithm.
This new function is faster than the COUNT(*) distinct function because it uses less memory. This new function can estimate the number of distinct values greater than 1,000,000,000, while using less than 1.5 KB or memory. Because it uses less memory, less data is spilled to tempdb than the COUNT(*) function. But there is a cost associated using this new function, and that is accuracy.
The APPROX_COUNT_DISTINCT function does not return the actual number of rows with each distinct value, but instead returns an approximate count. The approximate count might be higher or lower than the actual number. According to Microsoft’s documentation, 97% of the time the APPROX_COUNT_DISTINCT function will be within the 2% of the actual value.
If you have a really large table, want a count of distinct value and can live with an approximate count then you might be able to get a performance boost with this new function. Below is an example of how to use this new function:
SELECT APPROX_COUNT_DISTINCT(MyColumn) FROM [dbo].[MyTable];