Procedure: Employ the Clusters Discretization Method for Attribute Members in Analysis Services
When we browse a cube, we typically dimension the members
of one attribute hierarchy by the members of another attribute hierarchy. For
example, we might group customer sales by product purchased, by customer
geography (state, province or country), or by customer gender. However, with
certain types of attributes, it is useful to have Analysis Services automatically
create groupings of attribute members, particularly when large numbers of contiguous
values are involved, based upon the distribution of the members within an
attribute hierarchy.
As an example, lets assume that we have been approached
by representatives of our hypothetical client, the Adventure Works
organization. These representatives, members of the Human Resources department
of the business, tell us that one of the existing attributes of the Employee dimension,
Sick Leave Hours, does not serve them well in browses and reports, as the many
different values are simply listed, the members being derived from the unique Sick
Leave Hours values in the SickLeaveHours column of the DimEmployee table that
underlies the Analysis Services layer of the business intelligence solution.
(The developer of the cube, who has suddenly, and without warning, returned to
his home country, is no longer available to assist the HR department in
modifying the Sick Leave Hours presentation into a more useful format.)
We listen to the description of the problem, and then
suggest grouping the Sick Leave Hours information via the pre-defined Clusters
discretization method offered within Analysis Services. As is often the case,
we suggest this as a starter approach, so that our client colleagues can see
how grouping the values in this manner might make the data more meaningful in
browsing / reporting. We tell them that, once we do this, information
consumers who browse the Sick Leave Hours attribute hierarchy will see the
names / values of the groups instead of the members themselves. This limits the
number of levels that are presented to users, which can be less confusing, and
more useful for analysis. (We inform the client representatives, too, that
other methods of discretization can be leveraged if the Clusters method proves
less than ideal we are simply suggesting the Clusters method as a good
starting point to illustrate an approach to meeting the end objective an
approach that we can easily tweak once our colleagues understand the general
concept of discretization, and can then make intelligent choices with regard to
specific nuances that they might deem desirable.
Browse the Existing, Ungrouped Members within the Attribute Hierarchy
We
will begin our practice with the Clusters discretization method within the Sick
Leave Hours attribute hierarchy of the Employee dimension.
1.
Within the Solution
Explorer, right-click the Employee dimension (expand the Dimensions folder as
necessary).
2.
Click Open on
the context menu that appears, as shown in Illustration 3.
Illustration 3: Opening the Dimension via the Dimension Designer ...
The
tabs of the Dimension Designer open. Lets look at the current state of the
attribute under consideration, Sick Leave Hours.
3.
Click the Browser
tab.
4.
Select Sick
Leave Hours within the dropdown Hierarchy selector atop the Browse tab, as
depicted in Illustration
4.
Illustration 4: Select the Sick Leave Hours Hierarchy to Browse ...
5.
Expand the Employees
level by clicking the + sign that appears to its immediate left.
The attributes belonging to the Sick Leave Hours attribute
hierarchy appear as
partially shown in Illustration 5.
Illustration 5: The Member Attributes, Sick Leave Hours Attribute Hierarchy of the Employee Dimension
We
note that many contiguous values appear. It is easy to see how grouping these
values into logical buckets might make the data easier for information
consumers to analyze.
Add Clusters Discretization within the Attribute Hierarchy
We
will enact the Clusters discretization method from the Dimension Structure tab
of the Employee dimension.
1.
Click the Dimension
Structure tab.
2.
Within the Attributes
pane, select Sick Leave Hours (the sixth attribute above the bottom of the
pane).
The Properties window for the attribute appears (by default
in the bottom right corner of the design environment), as partially depicted in Illustration 6.
Illustration 6: The Properties Window for the Sick Leave Hours Attribute
3. In the Properties window, click
the setting box to the right of the DiscretizationMethod property (which
currently contains the word None), to enable the selector on the right edge
of the box.
4. Select Clusters within the
selector, as shown in Illustration 7.
Illustration 7: Changing the DiscretizationMethod Property Setting to Clusters ...
The DiscretizationMethod property specifies the method
used to group the members of the owner attribute. Once we tell Analysis
Services the method of discretization, we can tell it how many buckets to
create for purposes of grouping the attribute members.
5. In the setting box to the right of
the DiscretizationBucketCount property (which currently contains the 0), replace the value for the DiscretizationBucketCount property with
the number 10.
The DiscretizationBucketCount property specifies the
number of buckets in which to discretize members of the attribute to which the
property belongs. The default setting is 0. When we leave the DiscretizationBucketCount property at default, Analysis
Services generates the number of groups independently, after sampling the
underlying data involved.
The affected portion of
the Properties window for the Sick Leave Hours attribute hierarchy appears as
depicted (modifications surrounded by the red box) in Illustration 8.
Illustration 8: Properties Window for the Sick Leave Hours Attribute Hierarchy, with Modifications
We are now ready to process the Analysis Services
database, and then to examine the results of our handiwork with the browser, as
before.