The GENERATE() Function
Introduction
The GENERATE() function, according
to the Analysis Services Books Online, "applies a set to each
member of another set and joins the resulting sets by union." The Books
Online goes on to say that GENERATE() "alternatively, returns a
concatenated string created by evaluating a string expression over a set."
We will examine the way the function accomplishes these combinations, eliminating
duplicates automatically (while allowing us the option to include them, if
necessary), in the sections that follow.
We will
examine the syntax for the GENERATE() function in general, building to
operations upon sets in practice exercises, within which we will meet a
hypothetical business need. In this way, we
will be able to clearly see that the GENERATE() function does, in fact,
produce the results we might expect. Our objective is to gain a richer
understanding of the capabilities found within the GENERATE() function,
together with a feel for its similarities to the CROSSJOIN() function,
which we have discussed in Basic Set Functions:
The CROSSJOIN() Function, as well as our previous article, The CROSSJOIN() Function: Breaking Bottlenecks.
Discussion
The GENERATE() function comes in two "models:"
The Books Online refer to these as a "set version" and
a "string version." In the set version, the function
generates a results set based upon the application of a specified secondary set
(which often itself contains a function) to a specified primary set. As we
shall see, the function conducts itself much like the CROSSJOIN()
function in many cases - most notably when the secondary set is composed of a
more or less fixed group of members. We will also see that the power of GENERATE()
is leveraged significantly when we go beyond a relatively fixed set of members
in the secondary set, and construct the secondary set via an expression that
specifies the primary set's current member.
The string version provides for the concatenation of
a string expression (substituted in the place of the secondary set appearing in
the set version) with each element appearing in the primary set. A delimiter
can be specified to separate the elements, as well, if this is useful to the
end result (as it proves to be in a practice example we undertake in a later
section).
Let's look at some syntax illustrations to further clarify
the operation of GENERATE().
Syntax
The set version of GENERATE()
resembles closely the string version, with regard to syntax. In the
former, the primary and secondary sets upon which the operation of the function
is to be performed are placed within the parentheses to the right of GENERATE.
The set version applies Set2 to each member of Set1,
performing a union of the resulting sets. We can direct that duplicates in the
results are retained by specifying ALL, but the default behavior is to
eliminate duplicates. The syntax is shown in the following string:
GENERATE( <<Set1>>, <<Set2>>[, ALL] )
The string version of GENERATE()
appears as follows:
GENERATE( <<Set>>, <<String Expression>>[, <<Delimiter>>])
Iterating through each member of the set
specified in <<Set>> above, this version of the function
evaluates the specified <<String Expression>> against the
respective member and returns a concatenation between the two in each case.
The member and the evaluated <<String Expression>> can be
delimited in the return string with the optionally supplied <<Delimiter>>,
should we desire separation of the two components in the concatenated string
that is returned.
The following simple example illustrates conceptually the
operation of the GENERATE() function, set version (by far the
more useful and pervasive version). It also shows that, within the context of
simpler requirements, we can often obtain the same results with a seemingly
less complex approach. The example then illustrates a more elaborate scenario,
where the GENERATE() approach is certainly more efficient.
NOTE: We will be
doing a practice exercise in subsequent sections, but if you want to "test
drive" the below samples, the syntax will work if it is cut and pasted, or
typed, into the MDX Sample Application. I have often found "fragments"
in discussions such as this less than useful, when one is trying to learn new
techniques, and so forth. The fact that it is easier for the author
makes the practice commonplace, but it is one of many aggravating aspects of
technical publishing that I hope to continue to avoid).
Let's say we have a requirement to return the top three
cities in the states of California and Washington with
regard to Units Shipped, one of several measures stored within the
sample FoodMart Warehouse cube that accompanies an Analysis Services
installation. We can achieve our objectives by employing the set version of the
GENERATE() function as follows:
SELECT
{[Measures].[Units Shipped]} ON COLUMNS,
{GENERATE
({[Store].[All Stores].[USA].[CA],
[Store].[All Stores].[USA].[WA]},
TOPCOUNT(DESCENDANTS([Store].Currentmember,
[Store].[Store City]),
3, [Measures].[Units Shipped]))} ON ROWS
FROM
[WAREHOUSE]
The query results would appear as depicted
in Table 1.
|
|
Units Shipped
|
|
Los Angeles
|
24,587
|
|
San Diego
|
23,835
|
|
Beverly Hills
|
10,759
|
|
Tacoma
|
32,411
|
|
Seattle
|
24,110
|
|
Bremerton
|
22,734
|
Table 1: Results of the GENERATE() Function, Selecting Units
Shipped as the Measure
We can obtain identical results with the following query:
SELECT
{[Measures].[Units Shipped]} ON COLUMNS,
{TOPCOUNT(
{[Store].[All Stores].[USA].[CA].Children},
3, [Measures].[Units Shipped]),
TOPCOUNT(
{[Store].[All Stores].[USA].[WA].Children},
3, [Measures].[Units Shipped])} ON ROWS
FROM
[WAREHOUSE]
The second query may seem more intuitive to many of us, and
certainly presents indirect insight into the operation of the GENERATE()
function. Intuitive or not, however, the GENERATE() function can
certainly be the compact alternative in more elaborate uses. Consider the
following query:
SELECT
{[Measures].[Units Shipped]} ON COLUMNS,
{GENERATE
({[Warehouse].[City].Members},
TOPCOUNT(DESCENDANTS([Warehouse].Currentmember,
[Warehouse].[Warehouse Name]),
1, [Measures].[Units Shipped]))} ON ROWS
FROM
[WAREHOUSE]
WHERE
([Time].[1998])
The query results would appear as shown in
Table 2.
|
|
Units Shipped
|
|
Bellmont Distributing
|
22,988
|
|
Rose Food Warehousing
|
10,355
|
|
Freeman And Co.
|
10,707
|
|
Derby and Hunt
|
23,925
|
|
Salka Warehousing
|
24,884
|
|
Focus, Inc.
|
2,189
|
|
Jamison, Inc.
|
21,664
|
|
Bastani and Sons
|
7,304
|
|
Anderson Warehousing
|
23,699
|
|
Worthington Food Products
|
10,045
|
|
Big Quality Warehouse
|
10,115
|
|
Artesia Warehousing, Inc.
|
24,714
|
|
Jorgensen Service Storage
|
19,483
|
|
Food Service Storage, Inc.
|
1,814
|
|
Quality Distribution, Inc.
|
26,569
|
|
Treehouse Distribution
|
32,409
|
|
Foster Products
|
1.949
|
|
Destination, Inc.
|
7,512
|
|
Quality Warehousing and
Trucking
|
29,041
|
|
Jones International
|
5,668
|
|
Jose Garcia, Inc.
|
31,221
|
|
Valdez Warehousing
|
2,353
|
|
Maddock Stored Foods
|
10,097
|
Table 2: Results of the Second Query Example Containing
the GENERATE() Function
The above example, where our query is retrieving the Warehouse
in each individual Warehouse City, together with the largest quantity of
Units Shipped, represents a scenario where the GENERATE() approach
is more concise than alternative approaches. To achieve the same result, we
would be forced to employ TOPCOUNT() for each Warehouse City
present in the Warehouse cube. This would be cumbersome, at best, and
result in a far lengthier query than the compact query we can achieve using GENERATE().
We will activate the concepts involved in the foregoing
discussions by practicing the use of the GENERATE() function in the
section that follows. As part of our practice, we will undertake examples with
each of the set and string versions of the function.