Introduction
In The
CROSSJOIN() Function: Breaking Bottlenecks, we introduced our efforts to
enhance CROSSJOIN() by discussing how the function can contribute to a
degradation in processing for queries that rely upon it in scenarios where we
encounter medium- to
large-sized data sets. We acknowledged the utility of the CROSSJOIN() function in scenarios where we wish
to generate a cross-product of members in two different sets, noting that the
capability that it gives us to specify "all possible combinations" is
convenient - indeed, the most straightforward way to perform such a combination
of two sets.
As we
noted next, however, the indiscriminate use of the CROSSJOIN() function,
like many other MDX functions, can slow reporting and analysis dramatically. The
consequential degradation of processing, we added, is often due to a failure to
understand how the function performs set combinations, and how its action can
lead to huge results datasets when applied to large cubes.
In The CROSSJOIN() Function:
Breaking Bottlenecks, we discussed that, in combining two sets, CROSSJOIN()
combines every member of the first set (all from a single dimension) with every
member of the second, creating a "Cartesian" effect as a result. We
further noted that combining two sets, for example, with the following query illustrates
a scenario, (on the scale of the small Warehouse sample that installs
with MSAS), where we experience an appreciation for the consequences when the
number of members in set 1, times the number of members in set 2, times the
member population in set 3, results in many combinations.
SELECT
{[Measures].[Warehouse Profit]} ON COLUMNS,
{CROSSJOIN([Warehouse].[Warehouse Name].Members,
CROSSJOIN([Store].[Store Name].Members,
[Product].[Product Name].Members))} ON ROWS
FROM
[WAREHOUSE]
We stated that this query would generate 898,560
combinations (24 individual Warehouses times 24 Stores times
1,560 distinct Products). The number
of combinations in a cube of the vastly larger sizes that occur in today's
business environment can obviously be crippling to performance. This is aggravated by the fact that the sparsity
that is prevalent in large cubes naturally leads to CROSSJOIN() results with
an even higher sparsity factor. (The results dataset produced by the above
query yields only a tiny fraction of combinations with non-empty measures.)
We learned that the process of generating all possible
combinations, empty or not, lies behind the performance drag, and becomes even
more pronounced when we attempt to perform query operations that must wait for
the combinations to be assembled, and then be applied to the resulting
dataset. The time consumed in assembling a large number of empty combinations
in these scenarios is largely wasted when a large percentage are "tossed"
in a subsequent step in the march toward the ultimate results. We will see an
example of this within this article, where we will begin with a query that is
suffering from the kind of degradation we are talking about.
For purposes of our practice procedure, we will assume that
we have been asked by management of a hypothetical client, once again, to
investigate degradation in performance of a query. The query was originally
constructed at the request of a group of information consumers in the Corporate
Planning department, shortly after the implementation of MSAS at the FoodMart
organization. The creator of the query, who initially wrote the MDX in a way
that seemed intuitive, intended to optimize it later. When his position
evaporated with the movement of many IT functions to an offshore organization,
the query was left in its original, suboptimal state.
Attempts to communicate with the offshore support team were
abandoned when it was learned that the building housing the group was destroyed
in a recent regional disaster. While most of the documentation, code, and
other collateral for in-process enterprise projects, together with corporate
financial information and customer information files, has been reported
missing, we are able to locate the query details among several files left by
the original author on a development server.
We discuss the
requirement with the information consumers, and gain insight into a potentially
important factor that might affect our optimization strategy. According to the
consumers, the production version of the Warehouse cube under
consideration contained an additional calculated member, added to meet a
specific reporting requirement. We develop a plan to examine the query under
consideration, before offering options for improving its performance.
As is often the case,
we decide to work with a copy of the development version of the Warehouse
cube, to allow the original to remain isolated. It is to this copy of the cube
that we will make the modification required to bring it into alignment with the
cube under consideration, adding the calculated member to the clone in
preparation for our optimization exercises.