Using the Automatic Attribute Discretization Method
As we learned in Introduction to Attribute Discretization, whenever
we work with attributes, we can expect to encounter two general types of
values, discrete and contiguous. Discrete values stand apart
distinctly, and have clearly defined logical boundaries between themselves. Citing
the Gender attribute, within the Customer dimension of the Adventure Works
sample UDM, wherein the attribute is considered to have only one of two discrete values, female or male,
we noted that possible values are naturally discrete for the lions share of attributes
occurring in the business world.
In contrast to discrete values, we noted that contiguous
values do not stand apart distinctly, but flow along, as if in a continuous
line. Moreover, we discussed the fact that contiguous values, especially within
large populations, can have very large numbers of possible values, and that information
consumers can find it difficult to work effectively and efficiently within such
wide ranges of values. As an example, we cited the Vacation Hours attribute, within
the Employee dimension of the Adventure Works sample UDM, an attribute which
could have a wide range of possible values, depending upon how many employees
are involved, whether there are limits on how many vacation days they can
accumulate, and considerations of this nature. (The member values are based
directly upon the values contained within the VacationHours column of the
DimEmployee table - with many of the values shared among multiple employees.)
The sheer number of values might make working with them cumbersome for
information consumers, if they are simply made available in their existing
state.
As we discussed in Introduction to Attribute Discretization, discretization
can help us to make it easier for information consumers to work with large
numbers of possible attribute member values. As discretization creates a
manageable number of groups of attribute values that are clearly separated by
boundaries, we can thereby group contiguous values into sets of discrete
values, via a system-generated collection of consecutive dimension members
known as member groups.
We also discussed, in Introduction to Attribute
Discretization, that once the discretization process groups the attribute members into
the member groups, the member groups are then housed within a level within the dimensional
hierarchy. (A given level within a dimensional hierarchy can contain either members
or member groups, but not both.) When information consumers browse a level
that contains member groups, they see the names and cell values of the member
groups. The members generated by Analysis Services to support member groups are
called grouping members, and they look like ordinary members.
Analysis Services affords us flexibility in methods of attribute
discretization, based upon algorithms of varying complexity. The different
methods of discretization all have the same function to group contiguous
values into sets of discrete values. They simply manage grouping via different
approaches. Beyond the Dimension Designer, Analysis Services also
supports user-defined discretization, via data definition language (DDL),
should the out of the box approaches not meet the business needs of our
local environments. Moreover, we can alternatively implement custom discretization
via the underlying data warehouse, using views at the relational level; named
calculations in the data source view; calculated members in Analysis Services;
or via other approaches.
Analysis Services supports four DiscretizationMethod
property settings, which include three pre-defined discretization methods:
-
None (default): Analysis Services
performs no grouping, and simply displays the attribute members.
-
Automatic: Analysis Services selects the
method that best represents the data: either the EqualAreas method or the Clusters
method.
-
EqualAreas: Analysis Services attempts to
divide the members in the attribute into groups that contain an equal number of
members.
-
Clusters: Analysis Services attempts to
divide the members in the attribute into groups that contain an equal number of
members. (Per the Books Online, this method is useful because it works
on any distribution curve, but is more expensive in terms of processing time.)
The DiscretizationMethod property determines
whether Analysis Services is to create groupings, and then determines the type of
grouping that is performed. As we learned in Introduction
to Attribute Discretization, Analysis Services does not perform any
groupings by default (the default setting for the DiscretizationMethod property
is None). When we enable Automatic grouping, as we shall see in the practice
session that follows, we direct Analysis Services to automatically determine
the best grouping method based upon the structure of the attribute for which we
are performing discretization. We then process the affected dimension / cube and
Analysis Services creates group ranges, and then distributes the total
population of attribute members appropriately across those groups. As we shall
see, once we specify a grouping method, we next specify the number of groups,
by using the DiscretizationBucketCount property (its default value is
zero).
As we
have noted throughout my MSSQL Server Analysis Services column, as well
as throughout my other Database Journal series, one of the most
important objectives in building a high performance Analysis Services solution
is an efficient and effective dimension design. The identification of
opportunities where we can effectively use attribute discretization, and the
effective design and placement of the member groups that we generate thereby,
can mean the provision of a much more consumer-friendly interface for our
clients and employers. We
will gain hands - on exposure to the Automatic method of attribute discretization in the
practice session that follows. Before we get started working within a sample cube clone, we will
need to prepare the local environment for the practice session. We will take
steps to accomplish this within the section that follows.
Preparation: Locate and Open the Sample
Basic UDM Created Earlier
In Dimensional Model
Components: Dimensions Part I, we created a sample basic Analysis Services database within which to
perform the steps of the practice sessions we set out to undertake in the
various articles of this subseries. Once we had ascertained that the new
practice database appeared to be in place, and once we had renamed it to ANSYS065_Basic
AS DB, we began our examination of dimension properties. We continued with our
examination of attributes within the same practice environment, which we will
now access (as we did within the earlier articles of this subseries)) by taking
the following steps within the SQL Server Business Intelligence Development
Studio.
NOTE: Please access the Analysis
Services database which we prepared in Dimensional Model Components: Dimensions Part I (and have used in subsequent
articles) before proceeding with this article. If you have not completed the
preparation to which I refer, or if you cannot locate / access the Analysis
Services database with which we worked in the referenced previous articles,
please consider taking the preparation steps provided in Dimensional Model Components: Dimensions Part I before continuing, and
prospectively saving the objects with which you work, so as to avoid the need
to repeat the preparation process we have already undertaken for subsequent
related articles within this subseries.
1.
Click Start.
2.
Navigate to,
and click, the SQL
Server Business Intelligence Development Studio, as appropriate.
We
briefly see a splash page that lists the components installed on the PC, and
then Visual Studio .NET 2005 opens at the Start page.
3.
Close the Start
page, if desired.
4.
Select File -> Open from the main menu.
5.
Click Analysis
Services Database ... from the cascading menu, as shown in Illustration 1.
Illustration 1: Opening the Analysis Services Database ...
The Connect
to Database dialog appears.
6.
Ensuring that
the Connect to existing database radio button atop the dialog is
selected, type the Analysis Server name into the Server input box (also near
the top of the dialog).
7.
Using the
selector just beneath, labeled Database, select ANSYS065_Basic AS DB, as
depicted in Illustration
2.
Illustration 2: Selecting the Basic Analysis Services Database ...
8.
Leaving other
settings on the dialog at default, click OK.
SQL
Server Business Intelligence Development Studio briefly reads the database from
the Analysis Server, and then we see the Solution Explorer populated with the
database objects. Having overviewed attribute discretization in this and the
previous article, we will now get some hands-on exposure to the use of Automatic
attribute discretization for the members of a representative dimension attribute
within our practice UDM.