Attribute Discretization: Using the “Clusters” Method

This article
continues my exploration of attribute discretization, a capability in Analysis
Services that allows us to group members of an attribute into a number of member
groups. Our concentration here will be to get some exposure to the pre-defined
Clustered” discretization method, one of three such pre-defined methods
supported by Analysis Services, through hands-on application of the method to a
representative dimension attribute within our sample UDM.

This article continues the overview of Attribute
Discretization in Analysis Services begun in Introduction
to Attribute Discretization
, and continued in Attribute
Discretization: Using the Automatic Method
and
Attribute Discretization: Using the “Equal Areas” Method

. Both this article and its
predecessor extend the examination of the dimensional model that we
began in Dimensional Model Components: Dimensions Parts I and II. After
taking up various additional components of the dimensional model in subsequent
articles, we performed hands-on exploration of the general characteristics and
purposes of attributes in Dimensional
Attributes:
Introduction and Overview Parts I through V. We then fixed our focus upon
the properties underlying attributes, extending our overview into attribute
member Keys, Names, Values and Relationships within several subsequent articles.

Note: For more information about my Introduction to
MSSQL Server Analysis Services column in general, see the section
entitled “About the MSSQL Server Analysis Services Series” that follows the
conclusion of this article.

Introduction

In Introduction to Attribute Discretization, Attribute Discretization: Using the Automatic Method,
and Attribute Discretization: Using the Equal
Areas Method
, I summarized preceding articles within the current subseries, consisting
of a general
introduction to the dimensional model. I noted the wide acceptance of the dimensional
model as the preferred structure for presenting quantitative and other
organizational data to information consumers. The articles of the series then
undertook an examination of dimensions, the analytical “perspectives” upon
which the dimensional model relies in meeting the primary objectives of
business intelligence, including its capacity to support:

  • the
    presentation of relevant and accurate information representing business
    operations and events;
  • the rapid and
    accurate return of query results;
  • “slice and
    dice” query creation and modification;
  • an environment
    wherein information consumers can pose questions quickly and easily, and obtain
    rapid results datasets.

We
extended our examination of dimensions into a couple of detailed articles.
These articles, Dimensional Model Components: Dimensions Parts I and
II, emphasized that dimensions, which represent the perspectives
of a business or other operation, and reflect the intuitive ways that
information consumers need to query and view data, form the foundation of the dimensional
model. We noted that each dimension within our model contains one or more hierarchies.
(As we learn in other articles of this series, two types of hierarchies exist
within Analysis Services: attribute hierarchies and user – sometimes called “multi-level”
– hierarchies.)

We
next introduced dimension attributes within the subseries, and conducted an extensive
overview of their nature, properties, and detailed settings in Dimensional Attributes: Introduction and Overview Parts
I

through V. We noted that attributes help us to define with
specificity what dimensions cannot define by themselves. Moreover, we learned
that attributes are collected within a database dimension, where we can access
them to help us to specify the coordinates required to define cube space.

Throughout
the current subseries, I have emphasized that dimensions and dimension attributes
should support the way that management and information consumers of a given
organization describe the events and results of the business operations of the
entity. Because we maintain dimension and related attribute information within
the database underlying our Analysis Services implementation, we can support
business intelligence for our clients and employers even when these details are
not captured within the system where transaction processing takes place.
Within the analysis and reporting capabilities we supply in this manner, dimensions
and attributes are useful for aggregation, filtering, labeling, and other
purposes.

Having covered the general characteristics and purposes of attributes
in Dimensional
Attributes: Introduction and Overview Parts I
through V, we fixed our focus upon the properties underlying them,
based upon the examination of representative attributes within our sample
cube. We then continued our extended examination of attributes to yet another
important component we had touched upon earlier, the attribute member Key, with
which we gained some hands-on exposure in practice sessions that followed our
coverage of the concepts. In Attribute
Member Keys – Pt I: Introduction and Simple Keys
and Attribute
Member Keys – Pt II: Composite Keys
, we explored the concepts of simple and composite
keys, narrowing our examination in Part I
to the former, where
we reviewed the Properties associated with a simple key, based upon the
examination of a representative dimension attribute within our sample UDM. In Part II, we revisited the differences
between simple and composite keys, and explained in more detail why composite
keys are sometimes required to uniquely identify attribute members. We then
reviewed the properties associated with a composite key, based upon the
examination of another representative dimension attribute within our sample UDM.

In Attribute Member Names,
we examined the attribute member Name property, which we had briefly introduced
in Dimensional Attributes: Introduction and
Overview Part V
. We shed some light on how attribute member Name
might most appropriately be used without degrading system performance or
creating other unexpected or undesirable results. We then examined the
“sister” attribute member Value property (which we introduced along with
attribute member Name in Dimensional Attributes:
Introduction and Overview Part V
) in Attribute
Member Values in Analysis Services
. As we did in our overview of attribute member Name,
we examined the details of Value. Our concentration was also similarly upon
its appropriate use in providing support for the selection and delivery of
enterprise data in a more focused and consumer-friendly manner, without the
unwanted effects of system performance degradation, and other unexpected or
undesirable results, that can accompany the uninformed use of the property.

In Introduction to
Attribute Relationships in MSSQL Server Analysis Services
, we examined yet another part of
the conceptual model, Attribute Relationships. In this introduction, we
discussed several best practices and design, and other, considerations involved
in their use, with a focus upon the general exploitation of attribute
relationships in providing support, once again, for the selection and delivery
of enterprise data. In the subsequent two related articles, Attribute Relationships: Settings and Properties and More Exposure
to Settings and Properties in Analysis Services Attribute Relationships
,
we examined attribute relationships in a manner similar to previous articles
within this subseries, concentrating in detail upon the properties that
underlay them.

With the next article, Introduction
to Attribute Discretization
, we introduced a capability in Analysis
Services – to which we refer as attribute discretization – that allows us to
group members of an attribute into a number of member groups. We discussed
design, and other, considerations involved in the discretization of attributes,
and touched upon best practices surrounding the use of this capability.

In Attribute
Discretization: Using the Automatic Method
, we introduced the first
of multiple pre-defined discretization methods supported within the Analysis
Services UDM. We discussed the options that are available, focusing upon the
employment of the Automatic discretization method within the sample cube, to
meet the business requirements of a hypothetical client. We then began our
practice session with an inspection of the contiguous members of a select attribute hierarchy,
noting the absence of grouping and discussing shortcomings of this default
arrangement. Next, we enabled the Automatic discretization method within the
dimension attribute Properties pane, and then reprocessed the sample cube with
which we were working to enact the new Automatic discretization of the select attribute
members. Finally, we performed further inspections of the members of the attribute
hierarchy involved in the request for assistance by our hypothetical client,
noting the new, more intuitive grouping established by the newly enacted Automatic
discretization method.

Finally, in last month’s article, Attribute Discretization: Using the Equal Areas Method,
we introduced the second of the pre-defined discretization methods supported
within the Analysis Services UDM. We discussed
the options that are available with this particular approach, as we did in the
article previous for the Automatic method, focusing upon the employment of the Equal Areas
discretization method, again within the sample cube, to meet the business
requirements of a hypothetical client. We then began our practice session with
an inspection, via
the browser in the Dimension Designer, of the contiguous members of another
select attribute hierarchy, noting the absence of grouping and discussing
shortcomings of this default arrangement. Next, we enabled the Equal Areas
discretization method within the dimension attribute Properties pane, and again
reprocessed the sample cube with which we were working to enact the new Equal
Areas discretization of the select attribute members. Finally, we performed
another inspection, via the Dimension Designer and Cube Designer browsers, of
the members of the attribute hierarchy involved in the request for assistance
by our hypothetical client, noting the new, more intuitive grouping established
by the newly enacted Equal Areas discretization method.

In this article, we will gain some hands-on exposure to
setting up yet another of the discretization methods supported by Analysis
Services. We will first briefly review the options that are available
(referencing their coverage in other articles, where applicable), and then work
with Clusters discretization in the sample cube. (In individual articles designed
specifically for the purpose, we will examine the setup of other discretization
options, in a manner similar to previous articles within this subseries,
gaining hand-on exposure to the use of those options in individual practice
scenarios.)

Our examination will include:

  • A brief review
    of attribute discretization in Analysis Services, potential benefits that
    accrue from discretization in our UDMs, and how the process can help us to meet
    the primary objectives of business intelligence.
  • A brief overview
    of the multiple pre-defined discretization processes supported within the Analysis
    Services UDM.
  • Examination,
    via the browser in the Dimension Designer, of the pre-existing members of a
    select attribute hierarchy, noting the absence of grouping and discussing
    shortcomings of this default arrangement.
  • Enablement of
    the Clusters discretization method within the dimension attribute Properties
    pane.
  • Reprocessing
    the cube to enact the new Clusters discretization of the select attribute
    members.
  • Another examination,
    via the browsers in both the Dimension Designer and the Cube Designer, of the
    members of a select attribute hierarchy, noting the new, more intuitive
    grouping established by the newly enacted Clusters
    discretization method.
  • Backward- and
    forward-looking references to previous and subsequent articles, respectively within
    our series, wherein we perform detailed examinations surrounding other details
    of discretization, as supported within the Analysis Services UDM.
William Pearson
William Pearson
Bill has been working with computers since before becoming a "big eight" CPA, after which he carried his growing information systems knowledge into management accounting, internal auditing, and various capacities of controllership. Bill entered the world of databases and financial systems when he became a consultant for CODA-Financials, a U.K. - based software company that hired only CPA's as application consultants to implement and maintain its integrated financial database - one of the most conceptually powerful, even in his current assessment, to have emerged. At CODA Bill deployed financial databases and business intelligence systems for many global clients. Working with SQL Server, Oracle, Sybase and Informix, and focusing on MSSQL Server, Bill created Island Technologies Inc. in 1997, and has developed a large and diverse customer base over the years since. Bill's background as a CPA, Internal Auditor and Management Accountant enable him to provide value to clients as a liaison between Accounting / Finance and Information Services. Moreover, as a Certified Information Technology Professional (CITP) - a Certified Public Accountant recognized for his or her unique ability to provide business insight by leveraging knowledge of information relationships and supporting technologies - Bill offers his clients the CPA's perspective and ability to understand the complicated business implications and risks associated with technology. From this perspective, he helps them to effectively manage information while ensuring the data's reliability, security, accessibility and relevance. Bill has implemented enterprise business intelligence systems over the years for many Fortune 500 companies, focusing his practice (since the advent of MSSQL Server 2000) upon the integrated Microsoft business intelligence solution. He leverages his years of experience with other enterprise OLAP and reporting applications (Cognos, Business Objects, Crystal, and others) in regular conversions of these once-dominant applications to the Microsoft BI stack. Bill believes it is easier to teach technical skills to people with non-technical training than vice-versa, and he constantly seeks ways to graft new technology into the Accounting and Finance arenas. Bill was awarded Microsoft SQL Server MVP in 2009. Hobbies include advanced literature studies and occasional lectures, with recent concentration upon the works of William Faulkner, Henry James, Marcel Proust, James Joyce, Honoré de Balzac, and Charles Dickens. Other long-time interests have included the exploration of generative music sourced from database architecture.

Latest Articles