Procedure: Satisfy Business Requirements with MDX
Let's assume, for purposes of our practice example, that we
have received a request from representatives of our client, the Adventure
Works organization. As we have noted in other articles of the series, the
Reporting department, a group of client-facing authors and developers, often
requests assistance such as this. As a part of our relationship with Adventure
Works, as well as with other clients, we provide on-site augmentation for
business requirements gathering and training; we perform workshops, in many
cases, that illustrate approaches to meeting specific needs. These combined development workshops / "train
the trainer" events have worked well in the past for all concerned.
As usual, the authors and developers in the group are aware
that the particular need that they are currently expressing will manifest
itself in recurring situations as they work to meet the daily requirements of
the Adventure Works information consumers. This particular request for
assistance involves scenarios where they feel that the Distinct()
function might be highly useful.
In a brief discussion with members of the Reporting
department, we learn that the need has arisen, in crafting MDX queries for
various analysis and reporting needs, for an understanding of the Distinct()
function. Among the drivers for this requirement is the fact that, in several
recent instances, inexperienced report developers have executed new queries
only to find that, while the results datasets seem to be correct from other
perspectives, the displayed row and / or axis dimension members have contained
duplicates a situation that is undesirable in most cases (for example, when
the MDX query has been created to support a report picklist).
After gaining an understanding of the need, we explain to
the developers the reasons that this might have occurred. We also ascertain
that the requirement, at least at present, does not include a need to perform distinct
counts, for which we state that different remedies exist. We then agree to
demonstrate the use of an MDX set function, Distinct(), to eliminate the
duplicates in the returned datasets.
We convince the authors that they might best become familiar
with the Distinct() function by examining a couple of cases in which we induce,
and thus expect, duplication similar to that they have described in the results
datasets. Once we have created a scenario where duplication is clearly in
evidence, we will exploit Distinct() to eliminate duplication, not only
to demonstrate the application and effectiveness of the function, but also to
illustrate the rule to which Distinct() adheres in selecting the tuples
for elimination. The client representatives with which we are working agree
that, by creating a duplication scenario as an initial step, we can more
effectively demonstrate examples of the straightforward operation of Distinct()
within a meaningful context.
Procedure: Use the Distinct() Function to Eliminate
Duplication in MDX Query Results Datasets
Let's construct a simple query to provide the "starting
point" for our subsequent work with the Distinct() function. Our
intent here, again, is to return a results dataset that contains duplication,
and then to demonstrate the removal of duplicates, using Distinct()
within the original query.
1.
Type (or cut
and paste) the following query into the Query pane:
-- MDX044-001-1 Simple "Duplication" Scenario
SELECT
{[Measures].[Internet Sales Amount], [Measures].[Internet Order Count]}
ON AXIS (0),
NON EMPTY
{[Product].[Product Categories].[Category].[Bikes].CHILDREN,
DESCENDANTS([Product].[Product Categories].[Category].[Bikes],
[Product].[Product Name], SELF_BEFORE_AFTER)} ON AXIS (1)
FROM
[Adventure Works]
WHERE
([Date].[Calendar].[Calendar Year].[CY 2003])
The Query pane appears,
with our input, as shown in Illustration 8.
Illustration 8: Our Initial Query in the Query Pane ...
The above query sets the stage for "applying distinction"
(or, in other words, for the "elimination of duplication"). We have
a simple case, within the row axis, from which a redundant tuple is, unsurprisingly,
generated.
2.
Execute the
query by clicking the Execute button in the toolbar, as depicted in Illustration
9.
Illustration 9: Click Execute to Run the Query...
The Results pane is populated by Analysis
Services, and the dataset, partially shown in Illustration
10, appears.
Illustration 10: Results Dataset (Partial View) Initial "Duplication"
Scenario
In the partial view of the returned dataset, duplication appears
within the top rows as we see circled in the illustration above. We could,
of course, easily modify the query in other ways to remove the duplicate, but
our focus here is the use of Distinct() to accomplish this action.
3.
Select File
-> Save MDXQuery1.mdx As ..., name the file MDX044-001-1,
and place it in a meaningful location.
4.
Leave the
query open for the next step.
Our
developer / author colleagues express satisfaction with the contextual backdrop
we have established for introducing the Distinct() function. We will
undertake using the function in our next steps, first with the foregoing
example, and then within a "fresh" query we will construct.
5.
Replace the
comment line in query MDX044-001-1 with the following:
-- MDX044-001-2 Using DISTINCT() to Remove Duplication
6.
Select File
-> Save MDX044-001-1.mdx As ..., name the file MDX044-001-2.mdx,
and place it in the same location as its predecessor, to protect the former
query.
7.
Place the
cursor to the right of the NON EMPTY keyword on the fifth row of the
query.
8.
Press the
Enter key twice to create a new line between the line of the query on which
we have placed the cursor and the line that currently follows it, namely:
{[Product].[Product Categories].[Category].[Bikes].CHILDREN,
9.
Type the following
syntax into the new row:
DISTINCT(
10.
On what is now
the ninth row, place the cursor to the immediate right of the following:
[Product].[Product Name], SELF_BEFORE_AFTER)}
11.
Add another
right parenthesis ( ")" ) to the right of the existing right
curly brace, which immediately precedes "ON AXIS (1)" within
the row.
The
effect, of course, is to enclose the set specified on the rows axis within the newly
added Distinct() function. Once we have accomplished this simple
modification, the Query pane appears as depicted in Illustration 11.
Illustration 11: "Adjusted" Query in the Query
Pane (Modifications Circled)
12.
Execute the
query by clicking the Execute button in the toolbar, as before.
The Results pane is populated by Analysis
Services, and the dataset partially shown in Illustration 12 appears.
Illustration 12: Results Dataset (Partial View) Distinct()
at Work
We see that Distinct() has had the expected effect:
It has eliminated the duplicate tuple (labeled "Mountain Bikes")
that we saw in the earlier results dataset. Further, we can also discern,
because the first Mountain Bikes row is preserved in the dataset whose
query has employed the Distinct() function to remove duplicates, that
the function has removed "all except the first instance" of the once
duplicated tuple, as we might have predicted based upon our explanation of the
function in earlier sections.
13.
Select File
-> Save MDX044-001-2.mdx to ensure that that the file is
saved.
The
client developers and report authors express satisfaction with the results, and
confirm their understanding in the operation of the Distinct() function.
They then present an additional case where they wish to employ Distinct()
to eliminate duplicates from a slightly more elaborate scenario, which they
outline as follows: Using the Adventure Works cube as a data source,
they have already constructed a query which, like the previous query we
examined, produces duplicates within its results dataset. This query employs a
group of MDX functions to provide a listing, assuming a given individual within
the sales organization, of the organizational hierarchy within which that
individual functions. The hierarchy presented displays employees, both above
and below the level of the specified employee, within the hierarchy, along with
each employee's order count. The hierarchy, they explain, is defined by
"sales activity rollup" relationships, meaning that each level within
the sales hierarchy will include totals for the members of the level underneath.
The query as it is currently written concerns itself with Calendar Year 2004.