SHARE

MDX Set Functions: The Distinct() Function

Written By

Jun 5, 2006

4 minute read

About the Series …

This article is a member of the series, MDX Essentials. The series is designed to provide hands-on application of the fundamentals of the Multidimensional Expressions (MDX) language, with each session progressively adding features and capabilities designed to meet specific real-world needs.

Virtually all of the MDX we have constructed in earlier articles can now be used in the SQL Server Management Studio, SQL Server Business Intelligence Studio, and various other areas within the Microsoft integrated Business Intelligence solution, and much of what we construct going forward can be executed in the Analysis Services 2000 MDX Sample Application (assuming connection to an appropriate Analysis Services data source). MDX as a language continues to evolve and expand: we will focus on many new features in articles to come, while still continuing to examine business uses of MDX in general. The use of MDX to meet the real-world needs of our business environments will continue to be my primary concentration within the MDX Essentials series.

For more information about the series in general, as well as the software and systems requirements for getting the most out of its member lessons, please see Set Functions: The DrillDownMember() Function, where important information is detailed regarding the applications, samples and other components required to complete our practice exercises.

Overview

Microsoft Analysis Services (“Analysis Services“), as most of us know, leads the enterprise business intelligence arena with its rich set of analytical and reporting tools. Within the sphere of analysis and reporting with OLAP data sources, most of these tools rely upon functions based in the MDX query language. MDX is integrated not only within Analysis Services, but also throughout the entire Microsoft integrated Business Intelligence solution, in applications that include MSSQL Server, Analysis Services and Reporting Services, and that extend throughout Microsoft Office and other applications. This integration provides a distinct advantage for users of the platform over those who are limited to the offerings of the expensive, once-dominant enterprise BI solutions (few of which even accommodate direct editing of MDX within their “drag and drop” interfaces), and, particularly in the case of numerical and set functions, allows for easy, consistent application of built-in logic.

In this article, we will extend our examination of MDX functions to concentrate upon the basic, but useful, Distinct() function. We will discuss the straightforward purpose of the function, to return a set without duplicates from a set we specify within the function, as well as the manner in which the function manages to do this.

Along with an introduction to the Distinct() function, this lesson will include:

an examination of the syntax comprising the function;
illustrative examples of uses of the function in practice exercises;
a brief discussion of the MDX results obtained within each of the practice examples.

The Distinct() Function

Introduction

According to the Books Online, the Distinct() function “returns a set, removing duplicate tuples from a specified set.” Moreover, the Books Online state that, in cases where the Distinct() function finds duplicate tuples within the specified Set Expression, only the first instance of the duplicate tuple is retained within the returned results dataset.

Although Distinct() eliminates duplicate tuples within the specified Set Expression, the function leaves the order of the original set intact. Distinct() is useful in many applications, and, as is the case with most MDX functions, pairing it with other MDX functions can help us to leverage its power even further.

We will examine in detail the syntax for the Distinct() function after our customary overview in the Discussion section that follows. Following that, we will conduct practice examples within a couple of scenarios, constructed to support hypothetical business needs that illustrate uses for the function. This will afford us an opportunity to explore some of the delivery options that Distinct() can offer the knowledgeable user. Hands-on practice with Distinct(), where we will create queries that employ the function, will help us to activate what we have learned in the Discussion and Syntax sections.

Discussion

To restate our initial explanation of its operation, the Distinct() function removes duplicates that occur within a specified Set. If the specified Set contains duplicates, all except the first instance of the duplicated tuples are discarded – that is, duplicates are removed from the tail of the Set. The first instance (or only instance, if there are no duplicates) is returned within a Set that is ordered just as the Set specified within the function. (As we might expect, specification of an empty Set within the Distinct() function results in the return of an empty set).

Let’s look at syntax specifics to further clarify the operation of Distinct().

Syntax

Syntactically, anytime we employ the Distinct() function to return the distinct tuples of a specified set, we specify the Set Expression within the parentheses to the right of the Distinct keyword. The general syntax is shown in the following string:

Distinct(Set_Expression)

Let’s take a look at an illustration. The following snippet employs the Distinct() function:

DISTINCT(
  {[Geography].[Geography].[State-Province].[Georgia].CHILDREN,
     [Geography].[Geography].[City].[Atlanta],
        [Geography].[Geography].[City].[McDonough]}
            )
                ON AXIS(1)

This rows-axis specification, within a query executed against the Adventure Works sample cube that, say, specified the Reseller Sales Amount measure on the columns ( Axis(0) ), and which contained a Calendar Year 2004 slicer, would produce a results dataset similar to that partially depicted in Illustration 1.

Illustration 1: Results Dataset – Distinct() Function with Specified Set Containing Duplicates

In the example dataset, we see that the Cities of the State of Georgia appear in the order in which they would have appeared had we simply defined the row axis as [Geography].[Geography].[State-Province].[Georgia].CHILDREN. We have intentionally specified duplicates (the Cities of Atlanta and McDonough) within our query to illustrate the fact that the first instance of the duplicated Cities is retained, and the second instance discarded. This illustrates the manner in which duplicates are removed from the tail of the set within the results dataset.

WP

William Pearson

Bill has been working with computers since before becoming a "big eight" CPA, after which he carried his growing information systems knowledge into management accounting, internal auditing, and various capacities of controllership. Bill entered the world of databases and financial systems when he became a consultant for CODA-Financials, a U.K. - based software company that hired only CPA's as application consultants to implement and maintain its integrated financial database - one of the most conceptually powerful, even in his current assessment, to have emerged. At CODA Bill deployed financial databases and business intelligence systems for many global clients. Working with SQL Server, Oracle, Sybase and Informix, and focusing on MSSQL Server, Bill created Island Technologies Inc. in 1997, and has developed a large and diverse customer base over the years since. Bill's background as a CPA, Internal Auditor and Management Accountant enable him to provide value to clients as a liaison between Accounting / Finance and Information Services. Moreover, as a Certified Information Technology Professional (CITP) - a Certified Public Accountant recognized for his or her unique ability to provide business insight by leveraging knowledge of information relationships and supporting technologies - Bill offers his clients the CPA's perspective and ability to understand the complicated business implications and risks associated with technology. From this perspective, he helps them to effectively manage information while ensuring the data's reliability, security, accessibility and relevance. Bill has implemented enterprise business intelligence systems over the years for many Fortune 500 companies, focusing his practice (since the advent of MSSQL Server 2000) upon the integrated Microsoft business intelligence solution. He leverages his years of experience with other enterprise OLAP and reporting applications (Cognos, Business Objects, Crystal, and others) in regular conversions of these once-dominant applications to the Microsoft BI stack. Bill believes it is easier to teach technical skills to people with non-technical training than vice-versa, and he constantly seeks ways to graft new technology into the Accounting and Finance arenas. Bill was awarded Microsoft SQL Server MVP in 2009. Hobbies include advanced literature studies and occasional lectures, with recent concentration upon the works of William Faulkner, Henry James, Marcel Proust, James Joyce, Honoré de Balzac, and Charles Dickens. Other long-time interests have included the exploration of generative music sourced from database architecture.