Attribute Discretization: Using the “Equal Areas” Method

This article
continues my exploration of attribute discretization, a capability in Analysis
Services that allows us to group members of an attribute into a number of member
groups. Our concentration here will be to get some exposure to the pre-defined
“Equal Areas” discretization method, one of three such pre-defined
methods supported by Analysis Services, through hands-on application of the
method to a representative dimension attribute within our sample UDM.

This article continues the overview of Attribute Discretization
in Analysis Services begun in Introduction
to Attribute Discretization
, and continued in Attribute
Discretization: Using the Automatic Method
. Both this article and its predecessor extend the
examination of the
dimensional model that we began in Dimensional
Model Components: Dimensions Parts I
and II. After taking
up various additional components of the dimensional model in subsequent
articles, we performed hands-on exploration of the general characteristics and
purposes of attributes in Dimensional Attributes: Introduction and Overview Parts I through V. We then fixed our focus upon
the properties underlying attributes, extending our overview into attribute
member Keys, Names, Values and Relationships within several subsequent articles.

Note: For more information about my Introduction to MSSQL Server
Analysis Services column in general, see the section entitled “About the MSSQL Server Analysis
Services Series” that follows the conclusion of this article.

Introduction

In Introduction to Attribute Discretization and Attribute Discretization: Using the Automatic Method, I summarized preceding articles within the
current subseries, consisting of a general introduction to the dimensional
model
. I noted the wide acceptance of the dimensional model as the
preferred structure for presenting quantitative and other organizational data
to information consumers. The articles of the series then undertook an examination
of dimensions, the analytical “perspectives” upon which the dimensional model
relies in meeting the primary objectives of business intelligence, including
its capacity to support:

  • the
    presentation of relevant and accurate information representing business
    operations and events;
  • the rapid and
    accurate return of query results;
  • “slice and
    dice” query creation and modification;
  • an environment
    wherein information consumers can pose questions quickly and easily, and obtain
    rapid results datasets.

We
extended our examination of dimensions into a couple of detailed articles.
These articles, Dimensional
Model Components: Dimensions Parts I
and II,
emphasized that dimensions, which represent the perspectives of a
business or other operation, and reflect the intuitive ways that information
consumers need to query and view data, form the foundation of the dimensional
model. We noted that each dimension within our model contains one or more hierarchies.
(As we learn in other articles of this series, two types of hierarchies exist
within Analysis Services: attribute hierarchies and user – sometimes called “multi-level”
– hierarchies.)

We
next introduced dimension attributes within the subseries, and conducted an
extensive overview of their nature, properties, and detailed settings in Dimensional
Attributes: Introduction and Overview Parts I
through V. We noted that attributes help
us to define with specificity what dimensions cannot define by themselves. Moreover,
we learned that attributes are collected within a database dimension, where we
can access them to help us to specify the coordinates required to define cube
space.

Throughout
the current subseries, I have emphasized that dimensions and dimension attributes
should support the way that management and information consumers of a given
organization describe the events and results of the business operations of the
entity. Because we maintain dimension and related attribute information within
the database underlying our Analysis Services implementation, we can support
business intelligence for our clients and employers even when these details are
not captured within the system where transaction processing takes place.
Within the analysis and reporting capabilities we supply in this manner, dimensions
and attributes are useful for aggregation, filtering, labeling, and other
purposes.

Having covered the general characteristics and purposes of attributes
in Dimensional
Attributes: Introduction and Overview Parts I
through V, we fixed our focus upon the properties
underlying them, based upon the examination of representative attributes within
our sample cube. We then continued our extended examination of attributes to
yet another important component we had touched upon earlier, the attribute
member Key, with which we gained some hands-on exposure in practice sessions
that followed our coverage of the concepts. In Attribute
Member Keys – Pt I: Introduction and Simple Keys
and Attribute
Member Keys – Pt II: Composite Keys
, we explored the concepts of simple and composite
keys, narrowing our examination in Part I to the former, where we reviewed
the Properties associated with a simple key, based upon the examination of a
representative dimension attribute within our sample UDM. In Part II,
we revisited the differences between simple and composite keys, and explained
in more detail why composite keys are sometimes required to uniquely identify attribute
members. We then reviewed the properties associated with a composite key,
based upon the examination of another representative dimension attribute within
our sample UDM.

In Attribute
Member Names
, we examined the attribute member Name property,
which we had briefly introduced in Dimensional
Attributes: Introduction and Overview Part V
. We shed some light
on how attribute member Name might most appropriately be used without degrading
system performance or creating other unexpected or undesirable results. We then
examined the “sister” attribute member Value property (which we introduced
along with attribute member Name in Dimensional
Attributes: Introduction and Overview Part V
) in Attribute
Member Values in Analysis Services
. As we did in our overview of attribute
member Name, we examined the details of Value. Our concentration was also
similarly upon its appropriate use in providing support for the selection and
delivery of enterprise data in a more focused and consumer-friendly manner,
without the unwanted effects of system performance degradation, and other
unexpected or undesirable results, that can accompany the uninformed use of the
property.

In Introduction to Attribute Relationships in MSSQL Server
Analysis Services
, we examined yet another part of the conceptual model, Attribute
Relationships. In this introduction, we discussed several best practices and
design, and other, considerations involved in their use, with a focus upon the
general exploitation of attribute relationships in providing support, once
again, for the selection and delivery of enterprise data. In the subsequent
two related articles, Attribute Relationships: Settings and Properties and More Exposure to Settings and Properties in Analysis
Services Attribute Relationships
, we examined attribute
relationships in a manner similar to previous articles within this subseries,
concentrating in detail upon the properties that underlay them.

With the next article, Introduction
to Attribute Discretization
, we introduced a capability in Analysis
Services – to which we refer as attribute discretization – that allows us to
group members of an attribute into a number of member groups. We discussed
design, and other, considerations involved in the discretization of attributes,
and touched upon best practices surrounding the use of this capability.

Finally, in Attribute
Discretization: Using the Automatic Method
, we introduced the first
of multiple pre-defined discretization methods supported within the Analysis
Services UDM. We first discussed the options that are available, focusing upon
the employment of the Automatic discretization method within the sample cube,
to meet the business requirements of a hypothetical client. We then began our
practice session with an inspection of the contiguous members of a select attribute hierarchy,
noting the absence of grouping and discussing shortcomings of this default
arrangement. Next, we enabled the Automatic discretization method within the
dimension attribute Properties pane, and then reprocessed the sample cube with
which we were working to enact the new Automatic discretization of the select attribute
members. Finally, we performed further inspections of the members of the attribute
hierarchy involved in the request for assistance by our hypothetical client,
noting the new, more intuitive grouping established by the newly enacted Automatic
discretization method.

In this article, we will gain some
hands-on exposure to setting up another of the discretization methods supported
by Analysis Services. We will first briefly review the options that are
available (referencing their coverage in other articles, where applicable), and
then work with Equal Areas discretization in the sample cube. (In individual articles designed
specifically for the purpose, we will examine the setup of other discretization
options, in a manner similar to previous articles within this subseries,
gaining hand-on exposure to the use of those options in individual practice
scenarios.)

Our examination will include:

  • A brief review
    of attribute discretization in Analysis Services, potential benefits that
    accrue from discretization in our UDMs, and how the process can help us to meet
    the primary objectives of business intelligence.
  • A brief
    overview of the multiple pre-defined discretization processes supported within
    the Analysis Services UDM.
  • Examination,
    via the browser in the Dimension Designer, of the pre-existing members of a
    select attribute hierarchy, noting the absence of grouping and discussing
    shortcomings of this default arrangement.
  • Enablement of
    the Equal Areas discretization method within the dimension attribute Properties
    pane.
  • Reprocessing
    the cube to enact the new Equal Areas discretization of the select attribute
    members.
  • Another
    examination, via the browsers in both the Dimension Designer and the Cube
    Designer, of the members of a select attribute hierarchy, noting the new, more
    intuitive grouping established by the newly enacted Equal Areas discretization method.
  • Backward- and
    forward-looking references to previous and subsequent articles, respectively within
    our series, wherein we perform detailed examinations surrounding other
    discretization methods supported within the Analysis Services UDM.
William Pearson
William Pearson
Bill has been working with computers since before becoming a "big eight" CPA, after which he carried his growing information systems knowledge into management accounting, internal auditing, and various capacities of controllership. Bill entered the world of databases and financial systems when he became a consultant for CODA-Financials, a U.K. - based software company that hired only CPA's as application consultants to implement and maintain its integrated financial database - one of the most conceptually powerful, even in his current assessment, to have emerged. At CODA Bill deployed financial databases and business intelligence systems for many global clients. Working with SQL Server, Oracle, Sybase and Informix, and focusing on MSSQL Server, Bill created Island Technologies Inc. in 1997, and has developed a large and diverse customer base over the years since. Bill's background as a CPA, Internal Auditor and Management Accountant enable him to provide value to clients as a liaison between Accounting / Finance and Information Services. Moreover, as a Certified Information Technology Professional (CITP) - a Certified Public Accountant recognized for his or her unique ability to provide business insight by leveraging knowledge of information relationships and supporting technologies - Bill offers his clients the CPA's perspective and ability to understand the complicated business implications and risks associated with technology. From this perspective, he helps them to effectively manage information while ensuring the data's reliability, security, accessibility and relevance. Bill has implemented enterprise business intelligence systems over the years for many Fortune 500 companies, focusing his practice (since the advent of MSSQL Server 2000) upon the integrated Microsoft business intelligence solution. He leverages his years of experience with other enterprise OLAP and reporting applications (Cognos, Business Objects, Crystal, and others) in regular conversions of these once-dominant applications to the Microsoft BI stack. Bill believes it is easier to teach technical skills to people with non-technical training than vice-versa, and he constantly seeks ways to graft new technology into the Accounting and Finance arenas. Bill was awarded Microsoft SQL Server MVP in 2009. Hobbies include advanced literature studies and occasional lectures, with recent concentration upon the works of William Faulkner, Henry James, Marcel Proust, James Joyce, Honoré de Balzac, and Charles Dickens. Other long-time interests have included the exploration of generative music sourced from database architecture.

Latest Articles