The Types of Intervention
As most of us know, from
having worked with MDX in the preceding articles of this series and elsewhere,
MDX syntax allows us to define and manipulate multidimensional objects and
data. As with an SQL query, we have found that an MDX query requires a data
request (the SELECT clause), a point from which to begin (the FROM
clause), and a filter (the WHERE clause). These and other keywords,
which may or may not actually be visible in our queries, (many assume the
values of defaults when left unspecified), provide the tools used to extract
specific portions of data from a cube for analysis. MDX also supplies a robust
set of functions for the manipulation of data, data definition language (DDL)
syntax for managing data structures, and support for extending itself with
user-defined functions. MDX commands abound for creating (and deleting) cubes,
dimensions, measures, and their subordinate objects.
Many approaches exist
in the rich MDX toolset to help us accomplish our business objectives; and, as
we might reasonably expect, some approaches are more efficient in operation than
others are, predominantly in terms of memory and CPU utilization. In this
article, we will discuss some of the types of intervention that
we might perform to achieve optimization of a query, discussing the points at
which adjustments and / or enhancements might give us the performance boosts we
desire. While we will touch upon enhancements that can be made outside
our queries to contribute to performance, our focus in this "triptych"
of articles will be largely upon items that can be considered within the
queries themselves to contribute to overall efficiency.
The types of
intervention that we might investigate as options for the optimization
of the MDX queries we construct consist of the following general groups:
-
Control
of the location of processing;
-
Optimization
of set operations and syntax arrangement;
-
Optimal
uses of caching;
-
Substitution
of external functions;
-
Leveraging
database structure.
We will examine each of
these types of intervention in the following tutorials, and look at examples of
actions we might take within the context of most, to illustrate possible
performance enhancements we might bring about.
Understanding How Queries are Processed
Before we delve into our examination of
MDX query optimization techniques, a general understanding of how Analysis
Services processes our queries will be useful. It is from the perspective of
each of the process points that we can seek to optimize a given query. From a
high level, these steps occur in the following general order:
-
An MDX statement or query is
issued by a client application.
-
The statement or query is
parsed by the PivotTable Service, which rearranges it into an expression tree
to facilitate ease of use in subsequent steps.
-
The metadata identification is
resolved, and the determination made to process the query on the server or on
the client.
-
Final preparations for
processing occur, including possible further optimization procedures.
-
Rows and columns are identified
(the axes are resolved) and, in some cases (isolation mode, predominantly),
cell values are pre-fetched and cached.
-
The combined results are
returned to the application from which the query originated.
Integral to our lesson will be the location
at which the calculations are performed. The location at which the
operations occur is highly relevant to performance optimization, as most of us
can readily see. The location under consideration can be either the Analysis
Server or the client PivotTable service.
The Location of Query Processing
Control of the location
of processing, as we shall see, can be managed from the Large Level
Threshold property setting. While the large level value is a critical
influencer of which location is selected for query processing, we can also
control the location using other means, the chief of which is the Execution
Location parameter. We will examine these settings in more depth as we
proceed, but before we undertake to control the location of processing, we
should understand the benefits and costs associated with the client / server
processing options from which we can select. Some of the more important of
these are summarized in Table 1.
|
|
|
Client vs. Server Processing
|
|
|
|
|
|
|
Costs and Benefits
|
|
|
|
|
|
|
|
|
|
|
|
Processing
|
|
|
|
|
|
|
Location
|
|
Disadvantages
|
|
Advantages
|
|
|
|
|
|
|
|
|
|
Server
|
|
Consumes
more CPU and memory resources
|
|
Results
can be cached on the server, so that requests for the same result sets by
other client applications are quickly available
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Client
|
|
Takes
longer to process
|
|
Placing
processing burden on the client relieves the server of virtually all resource
requirements
|
|
|
|
|
|
|
|
|
|
|
|
Potentially
much greater network traffic, as large amounts of data are being dispatched
from the server.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Client-based
processing means client-based caching. This results in a loss of rapid
fulfillment of recurring requests, by multiple clients, for the same
information
|
|
|
|
Table 1: Advantages and Disadvantages of Processing
Location Options
Let's take a look at
the first of these two prominent settings, Large Level Threshold property,
with which we will begin a hands-on examination of managing processing
location.