MDX Set Functions: The Distinct() Function - Page 3

June 5, 2006

Procedure: Satisfy Business Requirements with MDX

Let's assume, for purposes of our practice example, that we have received a request from representatives of our client, the Adventure Works organization. As we have noted in other articles of the series, the Reporting department, a group of client-facing authors and developers, often requests assistance such as this. As a part of our relationship with Adventure Works, as well as with other clients, we provide on-site augmentation for business requirements gathering and training; we perform workshops, in many cases, that illustrate approaches to meeting specific needs. These combined development workshops / "train the trainer" events have worked well in the past for all concerned.

As usual, the authors and developers in the group are aware that the particular need that they are currently expressing will manifest itself in recurring situations as they work to meet the daily requirements of the Adventure Works information consumers. This particular request for assistance involves scenarios where they feel that the Distinct() function might be highly useful.

In a brief discussion with members of the Reporting department, we learn that the need has arisen, in crafting MDX queries for various analysis and reporting needs, for an understanding of the Distinct() function. Among the drivers for this requirement is the fact that, in several recent instances, inexperienced report developers have executed new queries only to find that, while the results datasets seem to be correct from other perspectives, the displayed row and / or axis dimension members have contained duplicates – a situation that is undesirable in most cases (for example, when the MDX query has been created to support a report picklist).

After gaining an understanding of the need, we explain to the developers the reasons that this might have occurred. We also ascertain that the requirement, at least at present, does not include a need to perform distinct counts, for which we state that different remedies exist. We then agree to demonstrate the use of an MDX set function, Distinct(), to eliminate the duplicates in the returned datasets.

NOTE: For detailed articles surrounding distinct counts in Analysis Services 2000, see Distinct Count Basics: Two Perspectives and Manage Distinct Count with a Virtual Cube, both members of my Introduction to MSSQL Server Analysis Services series.

We convince the authors that they might best become familiar with the Distinct() function by examining a couple of cases in which we induce, and thus expect, duplication similar to that they have described in the results datasets. Once we have created a scenario where duplication is clearly in evidence, we will exploit Distinct() to eliminate duplication, not only to demonstrate the application and effectiveness of the function, but also to illustrate the rule to which Distinct() adheres in selecting the tuples for elimination. The client representatives with which we are working agree that, by creating a duplication scenario as an initial step, we can more effectively demonstrate examples of the straightforward operation of Distinct() within a meaningful context.

Procedure: Use the Distinct() Function to Eliminate Duplication in MDX Query Results Datasets

Let's construct a simple query to provide the "starting point" for our subsequent work with the Distinct() function. Our intent here, again, is to return a results dataset that contains duplication, and then to demonstrate the removal of duplicates, using Distinct() within the original query.

1.  Type (or cut and paste) the following query into the Query pane:

-- MDX044-001-1 Simple "Duplication" Scenario
   {[Measures].[Internet Sales Amount], [Measures].[Internet Order Count]} 
      ON AXIS (0),
      {[Product].[Product Categories].[Category].[Bikes].CHILDREN, 
         DESCENDANTS([Product].[Product Categories].[Category].[Bikes], 
             [Product].[Product Name], SELF_BEFORE_AFTER)} ON AXIS (1)
   [Adventure Works]
               ([Date].[Calendar].[Calendar Year].[CY 2003])

The Query pane appears, with our input, as shown in Illustration 8.

Illustration 8: Our Initial Query in the Query Pane ...

The above query sets the stage for "applying distinction" (or, in other words, for the "elimination of duplication"). We have a simple case, within the row axis, from which a redundant tuple is, unsurprisingly, generated.

2.  Execute the query by clicking the Execute button in the toolbar, as depicted in Illustration 9.

Illustration 9: Click Execute to Run the Query...

The Results pane is populated by Analysis Services, and the dataset, partially shown in Illustration 10, appears.

Illustration 10: Results Dataset (Partial View) – Initial "Duplication" Scenario

In the partial view of the returned dataset, duplication appears within the top rows – as we see circled in the illustration above. We could, of course, easily modify the query in other ways to remove the duplicate, but our focus here is the use of Distinct() to accomplish this action.

3.  Select File -> Save MDXQuery1.mdx As ..., name the file MDX044-001-1, and place it in a meaningful location.

4.  Leave the query open for the next step.

Our developer / author colleagues express satisfaction with the contextual backdrop we have established for introducing the Distinct() function. We will undertake using the function in our next steps, first with the foregoing example, and then within a "fresh" query we will construct.

5.  Replace the comment line in query MDX044-001-1 with the following:

-- MDX044-001-2 Using DISTINCT() to Remove Duplication

6.  Select File -> Save MDX044-001-1.mdx As ..., name the file MDX044-001-2.mdx, and place it in the same location as its predecessor, to protect the former query.

7.  Place the cursor to the right of the NON EMPTY keyword on the fifth row of the query.

8.  Press the Enter key twice to create a new line between the line of the query on which we have placed the cursor and the line that currently follows it, namely:

{[Product].[Product Categories].[Category].[Bikes].CHILDREN,

9.  Type the following syntax into the new row:


10.  On what is now the ninth row, place the cursor to the immediate right of the following:

 [Product].[Product Name], SELF_BEFORE_AFTER)}

11.  Add another right parenthesis ( ")" ) to the right of the existing right curly brace, which immediately precedes "ON AXIS (1)" within the row.

The effect, of course, is to enclose the set specified on the rows axis within the newly added Distinct() function. Once we have accomplished this simple modification, the Query pane appears as depicted in Illustration 11.

Illustration 11: "Adjusted" Query in the Query Pane (Modifications Circled)

12.  Execute the query by clicking the Execute button in the toolbar, as before.

The Results pane is populated by Analysis Services, and the dataset partially shown in Illustration 12 appears.

Illustration 12: Results Dataset (Partial View) – Distinct() at Work

We see that Distinct() has had the expected effect: It has eliminated the duplicate tuple (labeled "Mountain Bikes") that we saw in the earlier results dataset. Further, we can also discern, because the first Mountain Bikes row is preserved in the dataset whose query has employed the Distinct() function to remove duplicates, that the function has removed "all except the first instance" of the once duplicated tuple, as we might have predicted based upon our explanation of the function in earlier sections.

13.  Select File -> Save MDX044-001-2.mdx to ensure that that the file is saved.

The client developers and report authors express satisfaction with the results, and confirm their understanding in the operation of the Distinct() function. They then present an additional case where they wish to employ Distinct() to eliminate duplicates from a slightly more elaborate scenario, which they outline as follows: Using the Adventure Works cube as a data source, they have already constructed a query which, like the previous query we examined, produces duplicates within its results dataset. This query employs a group of MDX functions to provide a listing, assuming a given individual within the sales organization, of the organizational hierarchy within which that individual functions. The hierarchy presented displays employees, both above and below the level of the specified employee, within the hierarchy, along with each employee's order count. The hierarchy, they explain, is defined by "sales activity rollup" relationships, meaning that each level within the sales hierarchy will include totals for the members of the level underneath. The query as it is currently written concerns itself with Calendar Year 2004.

The Network for Technology Professionals



Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers