Attribute Discretization: Customize Grouping Names

This article
continues my exploration of attribute discretization, a capability in Analysis
Services that allows us to group members of an attribute into a number of member
groups. Our concentration here will be to go beyond the individual
examinations of the methods themselves that we undertook in other articles, and
to get some exposure to custom naming of the groups created by the discretization
process we choose. The practice we undertake to this end will, once again,
surround a representative dimension attribute within our sample UDM.

Introduction

This article continues the overview of Attribute Discretization
in Analysis Services we began in Introduction
to Attribute Discretization
, and continued in Attribute
Discretization: Using the Automatic Method
, Attribute
Discretization: Using the “Equal Areas” Method
and Attribute
Discretization: Using the Clusters Method
. Both this article and its Discretization-related
predecessors extend the examination of the dimensional model begun in Dimensional
Model Components: Dimensions Parts I
and II, and
continued
through Dimensional Attributes: Introduction and
Overview Parts I
through V, as well as our focus upon the properties underlying attributes,
extending our overview into attribute member Keys, Names, Values and Relationships
within several subsequent articles.

Note: For more information about my Introduction to MSSQL Server
Analysis Services column in general, see the section entitled “About the MSSQL Server
Analysis Services Series” that follows the conclusion of this article.

In Introduction
to Attribute Discretization
, Attribute Discretization: Using the
Automatic Method, Attribute Discretization: Using the “Equal Areas” Method
and
Attribute Discretization: Using the Clusters Method, I summarized preceding articles within the current
subseries, consisting of a general introduction to the dimensional model. I
noted the wide acceptance of the dimensional model as the preferred structure
for presenting quantitative and other organizational data to information
consumers. The articles of the series then undertook an examination of dimensions,
and, subsequently, dimension attributes, the analytical “perspectives” and
structures upon which the dimensional model relies in meeting the primary
objectives of business intelligence, including its capacity to support:

  • the
    presentation of relevant and accurate information representing business
    operations and events;
  • the rapid and
    accurate return of query results;
  • “slice and
    dice” query creation and modification;
  • an environment
    wherein information consumers can pose questions quickly and easily, and obtain
    rapid results datasets.

With Introduction to Attribute
Discretization
, we introduced a capability in Analysis Services – to
which we refer as attribute discretization – that allows us to group members of
an attribute into a number of member groups. We discussed design and other
considerations involved in the discretization of attributes, and touched upon
best practices surrounding the use of this capability.
In Attribute Discretization: Using the
Automatic Method
, we introduced the first of multiple pre-defined discretization
methods supported within the Analysis Services UDM. We discussed the options
that are available, focusing upon the employment of the Automatic discretization
method within the sample cube, to meet the business requirements of a
hypothetical client. We then began our practice session with an inspection of the contiguous members of a
select attribute hierarchy, noting the absence of grouping and discussing shortcomings
of this default arrangement. Next, we enabled the Automatic discretization
method within the dimension attribute Properties pane, and then reprocessed the
sample cube with which we were working to enact the new Automatic
discretization of the select attribute members. Finally, we performed further
inspections of the members of the attribute hierarchy involved in the request
for assistance by our hypothetical client, noting the new, more intuitive
grouping established by the newly enacted Automatic discretization method.

Next, in Attribute
Discretization: Using the Equal Areas Method
, we introduced the
second of the pre-defined discretization methods supported within the Analysis
Services UDM. We discussed the options that are available with this particular
approach, as we did in the article previous for the Automatic method, focusing
upon the employment of the Equal Areas discretization method, again within the
sample cube, to meet the business requirements of a hypothetical client. We
then began our practice session with an inspection, via the browser in the Dimension
Designer, of the contiguous members of another select attribute hierarchy,
noting the absence of grouping and discussing shortcomings of this default
arrangement. Next, we enabled the Equal Areas discretization method within the
dimension attribute Properties pane, and again reprocessed the sample cube with
which we were working to enact the new Equal Areas discretization of the select
attribute members. Finally, we performed additional inspections, via the Dimension
Designer and Cube Designer browsers, of the members of the attribute hierarchy involved
in the request for assistance by our hypothetical client, noting the new, more
intuitive grouping established by the newly enacted Equal Areas discretization method.

In last month’s article, Attribute
Discretization: Using the Clusters Method
, we introduced the third
of the pre-defined discretization methods supported within the Analysis
Services UDM. We again discussed the options that are available with this
particular approach, as we did in the previous articles for the Automatic and Equal
Areas methods, focusing this time upon the employment of the Clusters
discretization method, to meet the business requirements of a hypothetical
client within the sample cube. Our practice session again followed with an
inspection of the “pre-discretized”
contiguous members of another select attribute hierarchy, where we again noted
the absence of grouping and discussed shortcomings of this default
arrangement. We then enabled the Clusters discretization method within the
dimension attribute Properties pane, and after reprocessing the sample cube
with which we were working, we performed another inspection, again via the Dimension
Designer and Cube Designer browsers, of the members of the affected attribute
hierarchy, noting the more intuitive grouping established by the newly enacted Clusters
discretization method.

In this article, we will gain some hands-on exposure to going
beyond the First Group Member – Last Group Member default that Analysis
Services uses in creating group labels within the various discretization
methods, and enacting the generation of custom labels for our groups. Our examination will include:

  • A brief review
    (for those joining our “discretization subseries” for the first time) of attribute
    discretization in Analysis Services, potential benefits that accrue from discretization
    in our UDMs, and how the process can help us to meet the primary objectives of
    business intelligence.
  • Examination,
    via the browser in the Dimension Designer, of the pre-existing members of a
    select attribute hierarchy, noting the absence of grouping and discussing
    shortcomings of this default arrangement.
  • A discussion
    of customized discretization group labeling within Analysis Services.
  • Enablement of
    the Automatic discretization method within the dimension attribute Properties
    pane.
  • Reprocessing
    the cube to enact the new Automatic discretization of the select attribute
    members.
  • Another
    examination, via the browsers in both the Dimension Designer and the Cube
    Designer, of the members of our attribute hierarchy selection, noting the new,
    more intuitive grouping established by the newly enacted Automatic
    discretization method, together with the default labels provided by Analysis
    Services.
  • Modification
    of a copy of the naming template supplied with Analysis Services
  • Reprocessing
    the cube to enact the modified naming template we have supplied, within the
    context of Automatic discretization of the selected attribute members.
  • A final
    examination, via the browsers in both the Dimension Designer and the Cube
    Designer, of the members of a select attribute hierarchy, noting the new, more
    intuitive grouping, together with more user-friendly labels, established by the
    newly enacted Automatic discretization method.

Attribute Discretization with Custom Group Names

In this article, we will perform attribute discretization
once again, but in this practice session, we will extend hands-on exposure with
discretization to include the addition of custom group labels.

For those first joining my subset of articles surrounding discretization,
let’s do a brief overview of Analysis Services discretization in general. (Those
who have “been along for the ride” can, of course, skip directly to the
sections below.) As we learned in Introduction to
Attribute Discretization
, whenever we work with attributes, we can
expect to encounter two general types of values, discrete and contiguous. Discrete
values stand apart distinctly, and have clearly defined logical “boundaries”
between themselves. Citing the Gender attribute, within the Customer dimension
of the Adventure Works sample UDM, wherein the attribute is considered to have
only one of two discrete values,
female or male, we noted that possible values are naturally discrete for the
lion’s share of attributes occurring in the business world.

In contrast to discrete values, we noted that contiguous
values do not stand apart distinctly, but flow along, as if in a continuous
line. Moreover, we discussed the fact that contiguous values, especially within
large populations, can have very large numbers of possible values, and that
information consumers can find it difficult to work effectively and efficiently
within such wide ranges of values. As an example, we cited the Vacation Hours
attribute, within the Employee dimension of the Adventure Works sample UDM, an
attribute which could have a wide range of possible values, depending upon how
many employees are involved, whether there are limits on how many vacation days
they can accumulate, and considerations of this nature. (The member values are
based directly upon the values contained within the VacationHours column of the
DimEmployee table – with many of the values shared among multiple employees.)
The sheer number of values might make working with them cumbersome for
information consumers, if they are simply made available in their existing
state.

As we discussed in Introduction
to Attribute Discretization
, discretization can help us to make it
easier for information consumers to work with large numbers of possible attribute
member values. As discretization creates a manageable number of groups of attribute
values that are clearly separated by boundaries, we can thereby group contiguous
values into sets of discrete values, via a system-generated collection
of consecutive dimension members known as member groups.

We also discussed, in Introduction
to Attribute Discretization
, that once the discretization process groups the attribute
members into the member groups, the member groups are then housed within a
level within the dimensional hierarchy. (A given level within a dimensional
hierarchy can contain either members or member groups, but not both.) When information
consumers browse a level that contains member groups, they see the names and
cell values of the member groups. The members generated by Analysis Services to
support member groups are called grouping members, and they look like ordinary
members.

Analysis Services affords us flexibility in methods of attribute
discretization, based upon algorithms of varying complexity. The different
methods of discretization all have the same function – to group contiguous
values into sets of discrete values. They simply manage grouping via different
approaches. Beyond the Dimension Designer, Analysis Services also supports
user-defined discretization, via data definition language (DDL), should the
“out – of – the – box” approaches not meet the business needs of our local
environments. Moreover, we can alternatively implement custom discretization
via the underlying data warehouse, using views at the relational level; named
calculations in the data source view; calculated members in Analysis Services;
or via other approaches.

Analysis Services supports four DiscretizationMethod
property settings, which include three pre-defined discretization methods:

  • None (default):
    Analysis Services performs no grouping, and simply displays the attribute
    members.
  • Automatic:
    Analysis Services selects the method that best represents the data: either the EqualAreas
    method or the Clusters method.
  • EqualAreas:
    Analysis Services attempts to divide the attribute members into groups that
    contain an equal number of members.
  • Clusters:
    Analysis Services attempts to divide the members of the attribute into groups
    that contain an equal number of members. (Per the Books Online, this
    method is useful because it works on any distribution curve, but is more
    expensive in terms of processing time.)

The DiscretizationMethod property determines whether Analysis
Services is to create groupings, and then determines the type of grouping that
is performed. As we learned in Introduction to
Attribute Discretization
, Analysis Services
does not perform any groupings by default (the default setting for the DiscretizationMethod
property is “None”). When we enable Automatic grouping, as we shall see in the
practice session that follows, we direct Analysis Services to automatically
determine the best grouping method based upon the structure of the attribute
for which we are performing discretization. We then process the affected
dimension / cube and Analysis Services creates group ranges, and then
distributes the total population of attribute members appropriately across
those groups. As we shall see, once we specify a grouping method, we next
specify the number of groups, by using the DiscretizationBucketCount property
(its default value is zero).

As we
have noted throughout my MSSQL Server Analysis
Services
column, as well as throughout
my other Database Journal series’, one of the most important objectives
in building a high performance Analysis Services solution is an efficient and
effective dimension design. The identification of opportunities where we can
effectively use attribute discretization, and the effective design, placement
and labeling of the member groups that we generate thereby, can mean the
provision of a much more consumer-friendly interface for our clients and
employers. We will
gain hands – on exposure to the Automatic method of attribute discretization in the practice
session that follows, primarily as a means of supporting the focus of our
article, the custom naming of groups, versus the simple acceptance of the
default names provided by Analysis Services, anytime we discretize attributes. We will first enable Automatic
discretization, and then process the cube, which we will examine for default
naming of the attribute groups created. We will then focus upon the setup for customized
group labeling, reprocess the cube, and then examine the changes in the naming
of our attribute groups.

Before we
get started working within a sample cube clone, we will need to prepare the
local environment for the practice session. We will take steps to accomplish
this within the section that follows.

Preparation: Locate and Open the Sample Basic UDM Created Earlier

In Dimensional Model Components: Dimensions Part I, we created a sample basic Analysis Services database within which to
perform the steps of the practice sessions we set out to undertake in the
various articles of this subseries. Once we had ascertained that the new
practice database appeared to be in place, and once we had renamed it to ANSYS065_Basic
AS DB, we began our examination of dimension properties. We continued with our
examination of attributes within the same practice environment, which we will
now access (as we did within the earlier articles of this subseries)) by taking
the following steps within the SQL Server Business Intelligence Development
Studio.

NOTE: Please access the Analysis
Services database which we prepared in Dimensional
Model Components: Dimensions Part I
(and have used in subsequent
articles) before proceeding with this article. If you have not completed the
preparation to which I refer, or if you cannot locate / access the Analysis
Services database with which we worked in the referenced previous articles,
please consider taking the preparation steps provided in Dimensional
Model Components: Dimensions Part I
before continuing, and prospectively
saving the objects with which you work, so as to avoid the need to repeat the
preparation process we have already undertaken for subsequent related articles
within this subseries.

1. 
Click Start.

2. 
Navigate to,
and click, the SQL
Server Business Intelligence Development Studio, as appropriate.

We
briefly see a splash page that lists the components installed on the PC, and
then Visual Studio .NET 2005 opens at the Start page.

3. 
Close the Start
page, if desired.

4. 
Select File -> Open from the main menu.

5. 
Click Analysis
Services Database … from the cascading menu, as shown in Illustration 1.

Opening the Analysis Services Database

Illustration 1: Opening the Analysis Services Database …

The Connect
to Database dialog appears.

6. 
Ensuring that
the Connect to existing database radio button atop the dialog is selected, type
the Analysis Server name into the Server input box (also near the top of the
dialog).

7. 
Using the
selector just beneath, labeled Database, select ANSYS065_Basic AS DB, as
depicted in Illustration
2.

Selecting the Basic Analysis Services Database

Illustration 2: Selecting the Basic Analysis Services Database …

8. 
Leaving other
settings on the dialog at default, click OK.

SQL
Server Business Intelligence Development Studio briefly reads the database from
the Analysis Server, and then we see the Solution Explorer populated with the constituent
database objects. Having overviewed attribute discretization in this and the
previous article, we will now get some hands-on exposure to the use of Automatic
attribute discretization for the members of a representative dimension attribute
within our practice UDM. Our objective will be to then examine the default
labeling of the newly created attribute groups, before enabling the custom
labeling of the same attribute groups through the use of the naming template
supplied by Analysis Services, reprocessing the cube, and re-examining the
attribute groups for the custom naming we have enacted.

William Pearson
William Pearson
Bill has been working with computers since before becoming a "big eight" CPA, after which he carried his growing information systems knowledge into management accounting, internal auditing, and various capacities of controllership. Bill entered the world of databases and financial systems when he became a consultant for CODA-Financials, a U.K. - based software company that hired only CPA's as application consultants to implement and maintain its integrated financial database - one of the most conceptually powerful, even in his current assessment, to have emerged. At CODA Bill deployed financial databases and business intelligence systems for many global clients. Working with SQL Server, Oracle, Sybase and Informix, and focusing on MSSQL Server, Bill created Island Technologies Inc. in 1997, and has developed a large and diverse customer base over the years since. Bill's background as a CPA, Internal Auditor and Management Accountant enable him to provide value to clients as a liaison between Accounting / Finance and Information Services. Moreover, as a Certified Information Technology Professional (CITP) - a Certified Public Accountant recognized for his or her unique ability to provide business insight by leveraging knowledge of information relationships and supporting technologies - Bill offers his clients the CPA's perspective and ability to understand the complicated business implications and risks associated with technology. From this perspective, he helps them to effectively manage information while ensuring the data's reliability, security, accessibility and relevance. Bill has implemented enterprise business intelligence systems over the years for many Fortune 500 companies, focusing his practice (since the advent of MSSQL Server 2000) upon the integrated Microsoft business intelligence solution. He leverages his years of experience with other enterprise OLAP and reporting applications (Cognos, Business Objects, Crystal, and others) in regular conversions of these once-dominant applications to the Microsoft BI stack. Bill believes it is easier to teach technical skills to people with non-technical training than vice-versa, and he constantly seeks ways to graft new technology into the Accounting and Finance arenas. Bill was awarded Microsoft SQL Server MVP in 2009. Hobbies include advanced literature studies and occasional lectures, with recent concentration upon the works of William Faulkner, Henry James, Marcel Proust, James Joyce, Honoré de Balzac, and Charles Dickens. Other long-time interests have included the exploration of generative music sourced from database architecture.

Latest Articles