Fundamental MDX Concepts
The Multidimensional Expression (MDX) language was created with the primary purpose of allowing us to manipulate Multidimensional data in Microsoft SQL Server 2000 Analysis Services. MDX is defined in the OLAP extensions and OLE DB. It is used with Analysis Services as a query language, and is used by various Client components and applications to return data from OLAP cubes. A second important characteristic of MDX is its role as an expression language. Its functions can be used in Analysis Services to define calculated members, as well as to build local cubes and to query cube data using the Pivot Table Service with OLE DB and Microsoft ActiveX Data Objects (ADO). MDX also allows us to create and register flexible user-defined functions for our specific needs that accept arguments and return values in the MDX syntax.
Comparing and Contrasting MDX with SQL
For those of us familiar with the Structured Query Language (SQL), the MDX syntax will appear similar in many facets. It may appear, as we proceed with our introduction to MDX, that many of the actions we can perform with MDX can be handled (though perhaps not as well) with SQL. MDX exists to make this functionality more efficient and intuitive, with specific respect to multidimensional data and objects.
The SELECT clause (representing a data request), the FROM clause (representing a "starting point" for data selection), and the WHILE clause (in essence, a filter) are requirements for both MDX and SQL. MDX uses other keywords to query cubes and return data that we can analyze, as well as providing functions that serve as tools to manipulate returned data. Finally, MDX is further enhanced with the support of user-defined functions that allow us to extend and customize the tool to fit specific business needs that might not lend themselves to "off the shelf" solutions.
MDX also resembles SQL in its provision of Data Definition Language (DDL) syntax for managing our data structures. Commands exist, for example, for creating, modifying and deleting cubes, dimensions, measures, and other structural objects in the OLAP environment.
Most data definition/manipulation languages, including SQL, are constructed to return and manipulate data in two dimensions, commonly seen as a row dimension and a column dimension. A partial picture of a two-dimensional (or relational) database may appear as shown in the following Illustration:
Illustration 1: Database Schema for the Sample Budget Cube
The two-dimensional data represented in each of the tables depicted above contain fields, or single data elements that exist at the intersection of each row and column. As most of us know, a SQL query would specify columns with a SELECT statement, with a starting point defined in a FROM clause. The specific rows to be returned by the query would be specified by a WHERE clause, which acts in the capacity of a filter.
Basic MDX Terminology
The domain of MDX is multidimensional data, which, by its nature, is housed within structures (called cubes
) of greater than two dimensions. A dimension
is an organized hierarchy of categories (levels) that typically describes data in a fact table; I liken dimensions to "perspectives
" or "views
" of data. Instead of fields, the element(s) of data that are stored at the intersection of the cube's dimensions
are called measures
. There can be more than one measure stored at the intersections.
Illustrated below is an example cube I have constructed, based loosely upon the more elaborate "Budget" cube supplied as a sample with the installation of Analysis Services. The cube is composed of three dimensions: Stores, Accounts and Time. It stores two measures, Actual and Budget amounts. Each dimension is broken down into different, hierarchical levels, each of which is broken down further into members. For example, the Stores dimension is divided into State levels (seen here as Georgia and New York), which are composed of City levels (here Atlanta, Augusta, New York City and Albany). The Cities are subdivided into the individual Stores themselves, which are the members of the Stores dimension.
The intersect points of the local Stores, together with the General Ledger Accounts (the members composing the Income Statement levels of the Accounts dimension) and the operating months (the members making up the Quarter levels of the Time dimension), represent the point (cells) at which the measures are collected.
To identify and extract the data, whether it is for a single cell or multiple cells, MDX uses a reference system based upon the concept of tuples. Tuples list dimensions and members to identify individual cells, as well as groups of cells, in the cube. Because each cell is an intersection of all the dimensions of the cube, tuples can uniquely identify every cell in the cube.
Illustration 2: Example Cube, showing various Hierarchical Relationships
Tuples identify sections of the cube, called slices, composed of more than one cell. An ordered collection of tuples constitutes a set. In an MDX query, axis and slicer dimensions are composed of these sets of tuples. In addition, it is possible to create a named set. A named set is a set with an alias, used to make our MDX queries easier to understand and, if especially complex, more efficient to process.
When we use SQL, it is usually necessary to filter the potentially large amounts of data returned from a query on a table or tables. Specifying columns in the SELECT statement, and restricting the rows that are returned based on specific criteria with a WHERE statement, allow us to apply filters.
The concept of filtering is equally important in MDX, where a SELECT statement is used to select the axis dimensions (the dimensions and members to be returned in MDX - expected to return data for multiple members). The WHERE statement is used to restrict the returned data to a slicer dimension (which is composed of specific dimension and member criteria, and is expected to return data for a single member). The terms "axis dimension" and "slicer dimension" distinguish dimensions of the source cube cells of the query, indicated in the FROM clause, from the dimensions of the result cube cells, which can be composed of multiple cube dimensions.
Two more terms to consider in our review of fundamental MDX terminology are Calculated Members and User-defined Functions. Calculated members are derived members; that is, they are members that have no basis in the data itself but are created via evaluated expressions in MDX. They are returned by the same processes as a standard member, and can be created via a rich set of functions that are present in MDX. Calculated members extend our capabilities to manipulate multidimensional data. User-defined functions also extend those capabilities by allowing us to create custom functions to manipulate multidimensional data and to register those functions, giving us the flexibility of calling these functions from within the Calculated Member Builder (which we will introduce later in the session), MDX queries, and data definition language (DDL) statements that support MDX. User-defined functions can be programmed in any language that supports Component Object Model (COM) interfaces and are capable of accepting arguments and returning values in the MDX syntax.
Page 3: Fundamental MDX Syntax
See All Articles by Columnist William E. Pearson, III