Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted Feb 14, 2005

Introduction to MSSQL Server 2000 Analysis Services: Manage Distinct Count with a Virtual Cube - Page 2

By William Pearson

Manage Distinct Count with a Virtual Cube

Overview and Discussion

We mentioned in our introductory article, Distinct Count Basics: Two Perspectives, that it is common in the business environment to encounter the need to quantify precisely the members of various sets of data. A simple example, and one upon which we will expand in our hypothetical business requirement, involves the number of customers who are purchasing a product, or group of products, sold by an organization. We learned in the previous article that we can exploit settings within MSAS' Analysis Manager, as well as take more advanced approaches, to extend our analysis even further, and leverage MSAS to reach our specific business objectives.

We discussed why distinct counts differ from simple counts, noting that a distinct count might comprise, as an example, a count of the different products that were purchased, or of the individual customers who purchased our products. To review our discussion, COUNT(), in providing a total number of, say, customers, would also be providing multiple counts of the same customers, because customers will have, in most cases, purchased multiple products, multiple times. To reach our objective of counting different customers, then, we would need to count each different customer, only once. As we noted in our previous session, using COUNT() when DISTINCTCOUNT() is required not only misstates the number of different customers, but it also likely renders averages, and other metrics similarly based upon the count value, misleading or totally useless in our analysis efforts.

In this article, we will discuss and practice a solution for meeting an illustrative need, which expands upon the customer example to which we have alluded. Our practice example will also highlight the performance challenges that can arise in simply addressing such requirements in an intuitive manner. We will then take steps to reshape our solution to take advantage of another approach that meets the need, while bettering the performance of the overall solution.

Considerations and Comments

For purposes of this exercise, we will be working with the Warehouse cube, within the FoodMart 2000 MSAS database; these working samples accompany a typical installation of MSAS. If the samples are not installed in, or have been removed from, your environment, they can be obtained from the installation CD, as well as from the Analysis Services section of the Microsoft website. If you prefer not to alter the structure of your sample cubes as they currently exist, make copies of the cube we reference in the article before beginning the practice exercises. For instructions on copying cubes, see the Preparation section of Introduction to MSSQL Server 2000 Analysis Services: Semi-Additive Measures and Periodic Balances.

Hands-On Procedure

We will begin with a scenario that illustrates a requirement for a distinct count, using a hypothetical business need to add practical value. Let's say that a group of information consumers within the FoodMart organization have approached us with an information request they wish to meet using the Sales cube. The consumers want to be able to analyze the performance of products, by category, both in terms of dollar sales, and number of different customers contributing to those sales, for the third quarter (Q3) of 1997. In addition, they wish to see an "average sales per (distinct) customer" within the same dataset.

We will initially attempt to meet the needs of the consumers with relatively simple MDX, having introduced both MSAS and MDX approaches in Distinct Count Basics: Two Perspectives (see the steps provided there, if you have joined the series with this article, and find the initial query we present to be less than intuitive.)

Initial Approach via MDX

Let's initialize the MDX Sample Application, the platform from which we perform many practice exercises within the articles of our series. (We choose it because any organization that has installed MSAS has access to the Sample Application). We will create our initial query by taking the following steps:

1.  Start the MDX Sample Application.

We are initially greeted by the Connect dialog, shown in Illustration 1.


Illustration 1: The Connect Dialog for the MDX Sample Application

The illustration above depicts the name of my server, MOTHER1, and properly indicates that we will be connecting via the MSOLAP provider (the default).

2.  Click OK.

The MDX Sample Application window appears.

3.  Click File --> New.

A blank Query pane appears.

4.  Ensure that FoodMart 2000 is selected as the database name in the DB box of the toolbar.

5.  Select the Sales cube in the Cube drop-down list box.

The MDX Sample Application window should resemble that depicted in Illustration 2, complete with the information from the Sales cube displaying in the Metadata tree (left section of the Metadata pane).


Illustration 2: The MDX Sample Application Window (Compressed View)

We will begin creating our query with a focus on returning results to meet the expressed business need of the information consumers. We will construct two calculated members / measures, one to contain the distinct count of the Customers, and one to calculate the average sale for each Product Category, per individual Customer. We will then SELECT the two calculated measures, presenting them to the immediate right of the Unit Sales measure for each respective Product Category.

We will retrieve a dataset with the measure / calculated measures forming the column axis, and the Product Category forming the row axis.

1.  Create the following new query:


-- ANSYS32-1 Initial Attempt at Distinct Customer Dataset
WITH 
   MEMBER
       [Measures].[Distinct Customers]
      AS
         'COUNT(CrossJoin({[Unit Sales]}, 
             Descendants ([Customers].CurrentMember, 
                [Customers].[Name])), ExcludeEmpty)'
   MEMBER
      [Measures].[Avg Sales per Customer]
      AS
         '[Measures].[Unit Sales]/[Measures].[Distinct Customers]'
SELECT
   { [Measures].[Unit Sales], [Measures].[Distinct Customers], 
   		[Measures].[Avg Sales per Customer]} on Columns,
   {[Product Category].Members} ON ROWS
FROM 
   [SALES]
WHERE 
   ([Time].[1997].[Q3])

The above represents an attempt to meet the information consumers' objectives with what appears to be the straightforward use of the DISTINCTCOUNT() function within a calculated member, to contain the count of the distinct Customers. We then create a second calculated member based upon the first, which we divide into the Unit Sales measure to derive the Average Sales per (individual) customer, as requested by the intended audience. We SELECT all three into the desired matrix to render the desired presentation.

The calculated member Distinct Customers embodies the "heavy lifting" in the query. We used the following definition (within the AS clause string for calculated member Distinct Customers):


'COUNT(CrossJoin({[Unit Sales]}, 
             Descendants ([Customers].CurrentMember, 
                [Customers].[Name])), ExcludeEmpty)'

to count the non-null Sales / Customer member tuples that it found, thereby deriving the number of customers. Because we wish to avoid counting all customer names (the lowest level of the Customer hierarchy), regardless of our level position in the hierarchy, we inserted the Descendants() function shown; this forces a limitation upon the count to solely the customers under the current member of the Customers dimension.

2.  Execute the query using the Run Query button.

The results dataset appears as partially shown in Illustration 3.


Illustration 3: The Results Dataset (Partial View)

The first thing that we notice, after clicking Run Query, is that this query takes a little longer to run than many of the "sample" queries we have created in past articles. As a matter of fact, this is exactly the observation that I am hoping even those new to MDX in general will make. The query provides the data to meet the requirements of the information consumers, but performance could become a problem.

The overhead generated in the query is due to the requirement for MSAS to perform a runtime assessment of each customer member, and there are many members. While this overhead may not be unduly troublesome from the perspective of our sample data, performance will be degraded far more within the context of the sizes of member populations that exist in many production environments. The performance degradation we have witnessed in our tiny sample cube is extrapolated to those larger populations, to an extent that becomes very real to analysts and other information consumers that rely upon the system to provide data in a reasonable response time.

3.  Save the query as ANSYS32-1 in a convenient location.

4.  Close the Sample Application.

Our next step will be to examine an option for mitigating the performance hit suffered within the straightforward application of DISTINCTCOUNT() within our query.



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM


















Thanks for your registration, follow us on our social networks to keep up-to-date