The CrossJoin() Function
According to the Analysis Services
Books Online, the CrossJoin()
function "returns the cross product of two sets."
CrossJoin() is particularly handy as it allows us to prepare data for
presentation in a matrix format - the need to return "all possible
combinations" of the members / tuples of two or more sets is common.
CrossJoin() is powerful, as it allows us, in effect, to nest
one dimension inside another, but in a direct manner that enables the avoidance
of a great deal of extra coding that might be required to obtain the same
result through other means. A sample CrossJoin() result set, used
together with the NON EMPTY keyword (discussed later in this article) in
our last lesson as a preview, appears in Illustration 1.
We will examine the syntax for the CrossJoin()
function, mention a means for removing empty rows / columns from the results
dataset it retrieves, and then undertake practice examples built around
hypothetical business needs that illustrate logical uses for the function. We
will extend the examples to perform exclusion of empty combinations in a
query we construct, to afford us a scenario for examining an approach in that
regard.
Discussion
As we have already discovered, the CrossJoin() function
returns a cross product of the members of two different sets. The result of the
use of such a cross product is the placement of two dimensions on the same
axis, be it columns or rows, where CrossJoin() creates a new set made up
of all possible combinations (after the manner of a "Cartesian product")
of the original sets' members.
CrossJoin() handles only two dimensions at a time, so
if we need to perform a cross product across more than two dimensions,
we must nest the functions to realize our aims (we will see an example
of this in our practice exercises). Furthermore, the two sets upon which we
are performing a cross join must exist in separate dimensions. We can suppress
empty columns / rows for the results dataset with the inclusion of NON EMPTY
before the CrossJoin() function, which effectively transforms it to NonEmptyCrossJoin(),
a function in its own right.
Because the CrossJoin() function returns a set that
consists of the matches between every member of the first set and each member
of the second set, it is important to realize that large datasets can be
returned, and that performance can suffer as a result. If, for instance, we
cross join two sets containing 200 members each, our results dataset will
contain 40,000 tuples.
For many cubes, the CrossJoin() function returns
datasets with empty measures associated with a significant number of the resulting
tuples. We can often experience a performance boost with a NonEmptyCrossJoin(),
whose action is to eliminate the tuples with which no data is associated from
the results dataset. NonEmptyCrossJoin() also provides other features
that can facilitate more efficient processing, as we shall see in a subsequent
article that is devoted to NonEmptyCrossJoin().
Let's look at some syntax illustrations to further clarify
the operation of CrossJoin().
Syntax
Syntactically, the sets between
which we wish to return a cross product are specified within the parentheses to
the right of CrossJoin. The syntax is shown in the following string.
CrossJoin(<<Set1>>, <<Set2>>)
Contrary to a couple of texts I have come
across, the order in which the sets are placed in the CrossJoin()
function is significant. Placing the sets in different orders often results in
a materially different results dataset. The order of tuples in the resulting
set depends on the order of +Set1; and +Set2;, and the order of
the members within each set; the order should be considered in the light of
its effects upon the presentation of final results.
An alternative syntax simply
places an asterisk ("*") between the sets, as shown in the following
string:
<<Set1>> * <<Set2>>
The alternate syntax is an extension to
MDX in OLE DB 2.0 and greater, housed within MSSQL Server 2000 Analysis Service.
The following example
expression illustrates a use of the Crossjoin() function. Let's say
that a group of FoodMart information consumers, whose data is housed within the
Warehouse cube, wish to see the total Units Shipped by U.S.
Warehouse-State for each quarter of 1998. Furthermore, the
consumers wish to see the Units Shipped in a single column in the
report, and the Quarter / Warehouse-State information in the row axis of
the presentation.
The basic CrossJoin()
function, which would define the row axis of the result dataset, would be constructed
as follows:
CROSSJOIN(
{[Time].[Year].[1998].Children},
{[Warehouse].[Country].[USA].Children}
This is simply expressing that we wish to display the "Warehouse-State
by Quarter" in the row axis, and, assuming that the Units
Shipped measure was specified for the column axis, our result dataset would
be identical to that shown in Illustration 2.
Illustration 2: Results Dataset, Syntax Sample,
CrossJoin() Function in Row Axis
The results appear to be as expected from a numerical
standpoint, as well. Our expectation would be to see a total of twelve
combinations, every possible combination of 1998 Quarters and the U.S.
Warehouse-States, produced by the cross product of 1998's four quarters
(the first of the sets in the syntax sample above), times the number of Warehouse-States
(three). The twelve combinations are indeed the result, as we see in the
illustration above.
NOTE: For information surrounding the .Children
function, see MDX
Member Functions: The "Family" Functions.
We will practice the use of the CrossJoin() function
in the section that follows. Moreover, we will explore the use of the function
when we need to create multiple cross products (that is, cross joining more
than two sets). Finally, we will practice the use of NON EMPTY to
activate the concept of its use in eliminating empty rows / columns from
our presentation of CrossJoin() results. In each case, we will discuss
briefly the results we obtain to reinforce our overall understanding of the CrossJoin()
function.