The CrossJoin() Function
According to the Analysis Services Books Online, the CrossJoin() function "returns the cross product of two sets." CrossJoin() is particularly handy as it allows us to prepare data for presentation in a matrix format - the need to return "all possible combinations" of the members / tuples of two or more sets is common.
CrossJoin() is powerful, as it allows us, in effect, to nest one dimension inside another, but in a direct manner that enables the avoidance of a great deal of extra coding that might be required to obtain the same result through other means. A sample CrossJoin() result set, used together with the NON EMPTY keyword (discussed later in this article) in our last lesson as a preview, appears in Illustration 1.
We will examine the syntax for the CrossJoin() function, mention a means for removing empty rows / columns from the results dataset it retrieves, and then undertake practice examples built around hypothetical business needs that illustrate logical uses for the function. We will extend the examples to perform exclusion of empty combinations in a query we construct, to afford us a scenario for examining an approach in that regard.
As we have already discovered, the CrossJoin() function returns a cross product of the members of two different sets. The result of the use of such a cross product is the placement of two dimensions on the same axis, be it columns or rows, where CrossJoin() creates a new set made up of all possible combinations (after the manner of a "Cartesian product") of the original sets' members.
CrossJoin() handles only two dimensions at a time, so if we need to perform a cross product across more than two dimensions, we must nest the functions to realize our aims (we will see an example of this in our practice exercises). Furthermore, the two sets upon which we are performing a cross join must exist in separate dimensions. We can suppress empty columns / rows for the results dataset with the inclusion of NON EMPTY before the CrossJoin() function, which effectively transforms it to NonEmptyCrossJoin(), a function in its own right.
Because the CrossJoin() function returns a set that consists of the matches between every member of the first set and each member of the second set, it is important to realize that large datasets can be returned, and that performance can suffer as a result. If, for instance, we cross join two sets containing 200 members each, our results dataset will contain 40,000 tuples.
For many cubes, the CrossJoin() function returns datasets with empty measures associated with a significant number of the resulting tuples. We can often experience a performance boost with a NonEmptyCrossJoin(), whose action is to eliminate the tuples with which no data is associated from the results dataset. NonEmptyCrossJoin() also provides other features that can facilitate more efficient processing, as we shall see in a subsequent article that is devoted to NonEmptyCrossJoin().
Let's look at some syntax illustrations to further clarify the operation of CrossJoin().
Syntactically, the sets between which we wish to return a cross product are specified within the parentheses to the right of CrossJoin. The syntax is shown in the following string.
Contrary to a couple of texts I have come across, the order in which the sets are placed in the CrossJoin() function is significant. Placing the sets in different orders often results in a materially different results dataset. The order of tuples in the resulting set depends on the order of +Set1; and +Set2;, and the order of the members within each set; the order should be considered in the light of its effects upon the presentation of final results.
An alternative syntax simply places an asterisk ("*") between the sets, as shown in the following string:
<<Set1>> * <<Set2>>
The alternate syntax is an extension to MDX in OLE DB 2.0 and greater, housed within MSSQL Server 2000 Analysis Service.
The following example expression illustrates a use of the Crossjoin() function. Let's say that a group of FoodMart information consumers, whose data is housed within the Warehouse cube, wish to see the total Units Shipped by U.S. Warehouse-State for each quarter of 1998. Furthermore, the consumers wish to see the Units Shipped in a single column in the report, and the Quarter / Warehouse-State information in the row axis of the presentation.
The basic CrossJoin() function, which would define the row axis of the result dataset, would be constructed as follows:
This is simply expressing that we wish to display the "Warehouse-State by Quarter" in the row axis, and, assuming that the Units Shipped measure was specified for the column axis, our result dataset would be identical to that shown in Illustration 2.
Illustration 2: Results Dataset, Syntax Sample, CrossJoin() Function in Row Axis
The results appear to be as expected from a numerical standpoint, as well. Our expectation would be to see a total of twelve combinations, every possible combination of 1998 Quarters and the U.S. Warehouse-States, produced by the cross product of 1998's four quarters (the first of the sets in the syntax sample above), times the number of Warehouse-States (three). The twelve combinations are indeed the result, as we see in the illustration above.
NOTE: For information surrounding the .Children function, see MDX Member Functions: The "Family" Functions.
We will practice the use of the CrossJoin() function in the section that follows. Moreover, we will explore the use of the function when we need to create multiple cross products (that is, cross joining more than two sets). Finally, we will practice the use of NON EMPTY to activate the concept of its use in eliminating empty rows / columns from our presentation of CrossJoin() results. In each case, we will discuss briefly the results we obtain to reinforce our overall understanding of the CrossJoin() function.