Distinct Counts Concepts
Overview and Discussion
Anyone working within the
realm of business intelligence and general analysis realizes, in short order,
that we often encounter the need to quantify precisely the members of
various sets of data. Those of us who have become familiar with MSAS are aware
of its capabilities when it comes to categorizing and aggregating data within
the hierarchical contexts of dimensions and levels. We can, for the most part,
readily tap these capabilities from the user interface that MSAS provides.
Through the exploitation of more advanced approaches, including the use of
calculated members / measures, and multidimensional expressions ("MDX")
in general, we can extend our analysis even further, and leverage MSAS to reach
far more specific objectives.
One of the basic
requirements that come into play, at least in some form, in many analysis
scenarios, is the need to count the members of a set targeted for
analysis. An example might be the need to count the number of products we have
shipped from a given warehouse, or group of warehouses, to a given geographical
location, or a specific group of stores. This can be accomplished readily
enough with the Count() function, as most of us are aware.
Count() does a great job of giving us a total
count. Of course, the results we would achieve in using Count()
with products, in the scenarios above, would represent total number
of products shipped. What we would not get, and what we might find far
more useful in some situations, would be a count of the different
products that were shipped. Count(), in providing a total number, would
also be providing multiple counts of the same products, because products
will have been shipped multiple times, in many instances. To reach our
objective of counting different products, then, we would need to count each
different product shipped, only once. To count them multiple times not
only misstates the number of different products, but it also likely
renders averages, and other metrics based upon the count value, meaningless or
misleading.
The word "different"
here is easily supplanted by "distinct." Moreover, as many of us are
aware, the performance of distinct counts has historically presented a
challenge in the OLAP world. Let's discuss an example that illustrates the
challenge, and then transform that challenge to an opportunity to meet an
illustrative business need, using the distinct count capabilities found
within MSAS.
Considerations and Comments
For purposes of this
exercise, we will be working with the Warehouse cube, within the FoodMart
2000 MSAS database; these working samples accompany a typical installation
of MSAS. If the samples are not installed in, or have been removed from, your
environment, they can be obtained from the installation CD, as well as from the
Analysis Services section of the Microsoft website. If you prefer not to
alter the structure of your sample cubes as they currently exist, make copies
of the cube we reference in the article before beginning the practice
exercises. For instructions on copying cubes, see the Preparation
section of Introduction
to MSSQL Server 2000 Analysis Services: Semi-Additive Measures and Periodic
Balances.