The Role of Expression Arrangement in Optimization
As many
of us have learned, both in working with SQL and with MDX, the greatest strides
in performance tuning can often be gained by tuning the expressions
themselves. The simple rearrangement of a query can produce dramatic
performance gains, as we will see in this section. We have discussed the
control of resource use, to some extent, through the direction of a query's
location of execution. We also exposed a scenario in the previous section
where, even though the query might contain level populations that fall below
the threshold for server processing, client processing can be more expensive
than is apparent.
Let's
return to our example above, and see what we can do in the way of arranging
expressions to make our query less resource intensive. To do so, we will
resume where we left off within the Sample Application:
1.
Create a new
query to rephrase the last (saved as MXAS11-3), arranged as follows:
-- MXAS11-4: Optimized Set Operation
SELECT
{[Measures]. [Units Shipped]} ON COLUMNS,
TopCount (
Generate ( [Warehouse].[City].Members,
TopCount ( CrossJoin ({ [Warehouse].CurrentMember},
[Product].[Product Name].Members
), 7, [Measures].[Units Shipped]
)
), 7, [Measures].[Units Shipped]
) ON ROWS
FROM Warehouse
In our
rearrangement of query MXAS11-3, we are exploiting the GENERATE function
to rephrase our query to "determine the top seven tuples that we obtain
from cross-joining the warehouse city and named products, for the current
warehouse city. Then marry the results with the rest of the members of the
warehouse city level."
The
number of combinations that result within this query is 1,721 (1 x 1,560 named
products, plus 7 x 23 cities). This represents a significant reduction over
the number of combinations in the pre-optimized query, and results in a
corresponding reduction of over 95 percent.
As we
will see from the next step, the results will be identical to those in the
pre-optimized version.
2.
Execute the
query using the Run Query button.
The
results dataset appears as shown in Illustration 6.
Illustration 6: Results Dataset, Optimized
CrossJoin() Query
3.
Save the query
as MXAS11-4.
Through
this small exercise, we have illustrated the dramatic performance gains that
can be obtained by simply rephrasing a query. To summarize the approach, we
might keep in mind that we can replace complex CROSSJOIN() statements
with a GENERATE() and CROSSJOIN() combination, as we have
demonstrated above. The following general rule summarizes the approach:
CrossJoin (X,Y) =
GENERATE(X, CrossJoin(X.CurrentMember, Y)
Or
CrossJoin (X,Y) =
GENERATE (CrossJoin(X,Y.CurrentMember), Y)
Let's
take a look at another scenario, where the objective of our query is the
summation of two measures, Units Shipped and Warehouse Sales, for
a given set of products, stores and warehouses (all products, in all non-U.S.
stores and warehouses, in our example). We can approach the query in a couple
of ways, using calculated measures to effect the summations.
Let's
create a straightforward query first, where we create calculated members that
we combine with the measures we seek to obtain the correct calculations.
Now,
let's run the query and note the results.
2.
Execute the query
using the Run Query button.
The
results dataset appears as depicted in Illustration 7.
Illustration 7: Results Dataset, Pre-Optimized
Summation Query
3.
Save the query
as MXAS11-5.
Now
let's consider another approach to obtaining the same objective. This time, we
will place each set of calculation logic into a respective calculated measure.
4.
Create the
following new query:
-- MXAS11-6: Optimized Set Operation # 2
WITH
MEMBER [Measures].[Total Units Shipped] AS
'SUM (
{[Warehouse].[All Warehouses].[Canada],
[Warehouse].[All Warehouses].[Mexico]},
SUM (
{[Store].[All Stores].[Canada],
[Store].[All Stores].[Mexico]},
SUM (
{[Product].[All Products].[Drink],
[Product].[All Products].[Food]},
[Measures].[Units Shipped]
)))'
MEMBER [Measures].[Total Warehouse Sales] AS
'SUM (
{[Warehouse].[All Warehouses].[Canada],
[Warehouse].[All Warehouses].[Mexico]},
SUM (
{[Store].[All Stores].[Canada] ,[Store].[All Stores].[Mexico]},
SUM (
{[Product].[All Products].[Drink],[Product].[All Products].[Food]},
[Measures].[Warehouse Sales]
)))'
SELECT
{ [Measures].[Total Units Shipped],
[Measures].[Total Warehouse Sales] } ON COLUMNS,
{[Time].[1998]} ON ROWS
FROM Warehouse
Again,
we will run the query and observe the results.
5.
Execute the
query using the Run Query button.
The
results dataset appears as shown in Illustration 8.
Illustration 8: Results Dataset, Optimized
Summation Query
6.
Save the query
as MXAS11-6.
We
note that the results are identical for the two approaches to the query (with
the exception of the rounding of the Warehouse Sales amounts and the
measure names, either of which could be adjusted, were they significant
concerns). The second approach, however, performs appreciably better, as would
be evident were the query extracting a much larger amount of data from a
real-world data source. The following general rule summarizes the use of the
commutative nature of the SUM function (and applies equally to the MAX,
MIN, and TopCount() functions):
SUM (CrossJoin (X,Y), b) =
SUM(X, SUM(Y, b))
In
both the examples of performance enhancements we have explored above, and in
any other potential performance enhancement scenario we might contemplate, we
would need to consider the cost / benefit tradeoffs, actual performance metrics
against different sources, and whether the differences in performance are too
negligible to devote much time to attempting to increase performance further.
The point is that tuning of MDX expressions can be accomplished, both in the
manner suggested above or through other approaches, to maximize performance,
and that beginning with the MDX statements themselves is usually the best approach
to any optimization effort. Once the base query is tuned to an optimal state,
we can always consider client / server balancing, hardware upgrades, and
perhaps further measures to increase performance.
Summary and Conclusion ...
In
this lesson, we continued our focus on the use of control of location as
a primary intervention type within the realm of options available to us
for MDX query optimization. We performed a practice exercise to reinforce the
concepts exposed. We then extended our considerations of the available types
of intervention to include the optimization of set operations and syntax
arrangement considerations. Within our exploration of the importance of
the optimization of set operations, we undertook practice examples that
illustrated some ways we can rearrange queries to enhance performance, often
significantly.
In our
next lesson, the final article of the Optimizing MDX mini-series, we
will, expose methods of caching to load a commonly used slice of a cube
into memory, making for faster retrieval in prospective operations. Our
discussion will include various aspects of cache creation, and uses of caching
within MDX. In addition, we will touch upon other performance enhancement
options, including external functions and cube design modifications
and augmentation.
»
See All Articles by Columnist William E. Pearson, III