Introduction to Attribute Discretization

This article introduces attribute
discretization
in Analysis
Services
, and extends the examination of the dimensional model that
we began in Dimensional Model Components: Dimensions Parts I and
II. Here, we
will introduce attribute discretization, focusing upon the purpose and
benefits of this capability, which, as we shall see, affords us a means of
creating a manageable number of groups of attribute values that
are separated by distinct boundaries. In this way, discretization (or
“bucketization”) allows us to group contiguous values into sets of discrete
values.

Note: For more
information about my MSSQL
Server Analysis Services column in general, see the section entitled “About the MSSQL Server Analysis Services Series” that follows the conclusion of
this article.

Introduction

In Dimensional Model Components: Dimensions Parts I and II, we undertook a general
introduction to the dimensional model, noting its wide acceptance as the
preferred structure for presenting quantitative and other organizational data
to information consumers. We then began our examination of dimensions, the
analytical “perspectives” upon which the dimensional model relies in
meeting the primary objectives of business intelligence, including its
capacity to support:

  • the presentation
    of relevant and accurate information representing business operations and
    events;
  • the rapid and
    accurate return of query results;
  • “slice and
    dice” query creation and modification;
  • an environment
    wherein information consumers can pose questions quickly and easily, and achieve
    rapid results datasets.

To extend
the metaphor we used earlier in describing dimensions as nouns
and measures as verbs, we might consider attributes as
somewhat similar to adjectives. That is, attributes help us to
define with specificity what dimensions cannot define by themselves. Dimensions
alone are like lines in geometry: they don’t define “area” within
multidimensional space, nor do they themselves even define the hierarchies
that they contain. A database dimension is a collection of related
objects called attributes, which we use to specify the coordinates
required to define cube space.

Dimensions and dimension attributes
should support the way that management and information consumers of a given
organization describe the events and results of the business operations of the
entity. Because we maintain dimension and related attribute
information within the database underlying our Analysis Services implementation,
we can support business intelligence for our clients and employers even when
these details are not captured within the system where transaction processing
takes place. Within the analysis and reporting capabilities we supply in this
manner, dimensions and attributes are useful for aggregation,
filtering, labeling, and other purposes.

Having covered the general characteristics and purposes of attributes
in Dimensional
Attributes: Introduction
and Overview Parts I
through V, we fixed our focus upon the properties
underlying them, based upon the examination of a representative attribute within
our sample cube. We then continued our extended examination of attributes
to yet another important component we had touched upon earlier, the attribute
member Key, with which we gained some hands-on exposure in practice
sessions that followed our coverage of the concepts. In Attribute
Member Keys – Pt I: Introduction and Simple Keys
and Attribute
Member Keys – Pt II: Composite Keys
, we introduced attribute member Keys in detail, continuing our
recent group of articles focusing upon dimensional model components,
with an objective of discussing the associated concepts, and of providing
hands-on exposure to the properties supporting them.

In Attribute
Member Names
, we examined the attribute member Name
property, which we had briefly introduced in Dimensional
Attributes: Introduction and Overview Part V
. We examined
the details of the attribute member Name, and shed some light on
how they might most appropriately be used without degrading system performance
or creating other unexpected or undesirable results. Next, we examined the
“sister” attribute member Value property (which we introduced
along with attribute member Name in Dimensional Attributes: Introduction and Overview Part V)
in Attribute
Member Values in Analysis Services
. As we did in our overview of attribute
member Name, we examined the details of Value. Our concentration
was also similarly upon its appropriate use in providing support for the
selection and delivery of enterprise data in a more focused and
consumer-friendly manner, without the unwanted effects of system performance
degradation, and other unexpected or undesirable results, that can accompany
the uninformed use of the property.

Finally, in Introduction
to Attribute Relationships in MSSQL Server Analysis Services
, we examined another part of the
conceptual model, Attribute Relationships. In this introduction, we
discussed several best practices and design, and other considerations involved
in their use, with a focus upon the general exploitation of attribute
relationships
in providing support for the selection and delivery of
enterprise data. In the subsequent two related articles, Attribute
Relationships: Settings and Properties
and More
Exposure to Settings and Properties in Analysis Services Attribute
Relationships
, we examined attribute relationships in
a manner similar to previous articles within this subseries, concentrating in
detail upon the properties that underlay them.

In this article, we will introduce a capability in Analysis
Services
– to which we refer as attribute discretization – that
allows us to group members of an attribute into a number of member
groups
. We will discuss design, and other, considerations involved in the discretization
of attributes, and touch upon best practices surrounding the use of
this capability. Our focus will be upon the general exploitation of discretization
in providing support for the selection and delivery of enterprise data. (In other
articles designed specifically for the purpose, we will examine attribute discretization
in a manner similar to previous articles within this subseries, gaining hand-on
exposure to the use of discretization in a practice scenario.)

Our examination will include:

  • An overview of
    attribute discretization in Analysis Services, potential benefits
    that accrue from discretization in our UDMs, and how the
    process can help us to meet the primary objectives of business intelligence.
  • A discussion of general
    considerations and best practices surrounding attribute discretization.
  • A look forward
    to subsequent articles that within our series, where we will perform detailed examinations
    of the properties underlying attribute discretization, along with
    a review of the respective settings associated with each property, based
    upon a representative dimension attribute within our sample UDM.

Attribute Discretization

As we
have learned, attributes serve as the foundation for our dimensions
and cubes. Whenever we work with attributes,
we can expect to encounter two general types of values:

  • Discrete attributes: Discrete
    values stand apart distinctly, and have clearly defined logical “boundaries”
    between themselves. The possible values are naturally discrete for the
    lion’s share of attributes occurring in the business world.

    Example: The Gender attribute, within
    the Customer dimension of the Adventure Works sample UDM, is
    (at least for purposes of the sample cube) considered to have only one of two discrete
    values, female or male.

  • Contiguous attributes: Contiguous values do not stand apart
    distinctly, but flow along, as if in a continuous line. Contiguous
    values, especially within large populations, can have very large numbers of
    possible values. Information consumers can find it difficult to work
    effectively and efficiently within such wide ranges of values.

    Example: the Vacation Hours
    attribute, within the Employee dimension of the Adventure Works
    sample UDM, can have a wide range of possible values, depending upon how
    many employees are involved, whether there are limits on how many vacation days
    they can accumulate, and considerations of this nature. The member
    values are based directly upon the unique values contained within the VacationHours
    column of the DimEmployee table (with many of the unique values
    shared among multiple employees). The sheer number of values might make
    working with them cumbersome for information consumers, if they are simply made
    available in their existing state.

Overview

Discretization can help us to make it easier for information consumers
to work with large numbers of possible attribute member values. As we
have learned, discretization is the process of creating a manageable
number of groups of attribute values that are clearly separated by
boundaries. We can thereby use discretization as a means of group contiguous
values into sets of discrete values, via a system-generated collection
of consecutive dimension members known as member groups.

The discretization process
groups the attribute members into the member groups, once they
are generated; the member groups are then housed within a level
within the dimensional hierarchy. (A given level within a dimensional
hierarchy
can contain either members or member groups, but
not both.) When information consumers browse a level that contains member
groups
, they see the names and cell values of the member groups. The
members generated by Analysis Services to support member
groups
are called grouping members, and they look like ordinary
members.

Analysis Services affords us several variations of attribute discretization,
based upon algorithms of varying complexity. The different methods of discretization
all have the same function – to group contiguous values into sets of discrete
values. They simply manage grouping via different approaches. Beyond the Dimension
Editor
, Analysis Services also supports user-defined discretization,
via data definition language (DDL), should the “out – of – the –
box” approaches not meet the business needs of our local environments. Moreover,
we can alternatively implement custom discretization via the underlying
data warehouse, using views at the relational level; named
calculations
in the data source view; calculated members in Analysis
Services
; or via other approaches.

As we
have noted throughout my MSSQL Server Analysis Services series, as well as throughout my other Database
Journal
series’, one of the most important objectives in building a high
performance Analysis Services solution is an efficient and effective dimension
design. The identification of opportunities where we can effectively use attribute discretization,
and the effective design and placement of the member groups that we
generate thereby, can mean the provision of a much more consumer-friendly
interface for our clients and employers.

Best Practices and Other Considerations Surrounding Attribute Discretization

Best practices dictate that, in creating
attribute / member groups, we give those groups names that are intuitive
for information consumers – names that best represent the semantics of
the business. Analysis Services provides a naming template that
we can use – or modify and use as a custom template or templates
– to make this easier for us. Member group names are generated
automatically, via the template, when the member groups are
created. Unless we specify a naming template, the default naming
template
is used. (We can change this method of naming by specifying a naming
template
in the Format option for the NameColumn property of
an attribute). Different naming templates can be defined for
every language specified in the Translations collection of the column
binding that has been used for the NameColumn property of the attribute.

Another best practice is to make the sort
order
of members meet the requirements of information consumers. Sort
order
is controlled via the OrderBy
property of the attribute. Based on this sort order, the members
in a member group are ordered consecutively. Finally, another best
practice, from the perspective of “consumer-friendly” member group
generation, is the consideration of intuitive, logical drill down
paths. For example, one common use for member groups is to support drill
down
from a level with few members to a level with
many members. (We will get some hands-on exposure to providing this sort
of support in subsequent articles of this series).

We need to keep in mind that, when
we process a dimension within Analysis Services, a discretized
attribute
is rediscretized only with a full update (ProcessFull).
To rediscretize an attribute, we must perform a full update of
the dimension – processing via an incremental update (ProcessAdd),
will not rediscretize a discretized attribute, meaning that the names
and children within the new “buckets” remain the same.

NOTE: We will examine in detail the properties (and their settings)
that support attribute discretization within the sample Adventure
Works
cube in the hands-on practice sections of subsequent articles in this
series.

Other significant considerations in working with attribute
discretization
include usage limitations. First, we need to be
aware that member groups (and, therefore, attribute discretization) are not
supported for dimensions that use the ROLAP storage mode.
Moreover, we cannot create member groups in the topmost or bottommost level
of a hierarchy. (We can approach the need to accomplish this by simply
adding a level in such a way that the level in which we want to
create member groups is no longer the top or bottom level. We can
hide the added level by setting its Visible property to False, as appropriate). An additional consideration
is that we cannot create member groups
in two consecutive levels of a hierarchy.

Finally, we need to keep in mind that, if the dimension
table
of a dimension that contains member groups is updated,
and the dimension is subsequently fully processed, a new set
of member groups
is generated. Under such a set of circumstances, the names
and children of the new member groups may be different from the
old member groups.

We will examine the many properties, and their settings,
that we use in performing and maintaining attribute discretization in
subsequent articles of this column, where we will gain hands-on exposure to
these in a working environment.

Conclusion

In this
article we introduced attribute discretization in Analysis
Services
, extending the examination of the dimensional model that
we began in Dimensional Model Components: Dimensions Parts I and II. In a manner similar to previous
articles within this subseries, we overviewed the general concepts involved and
looked ahead to subsequent articles, where we will perform hands-on, detailed
examination of the properties that support attribute
discretization
within a working sample environment. Our focus, as we stated, was upon
the appropriate use of attribute discretization in providing support for the
selection and delivery of enterprise data in a more focused and
consumer-friendly manner.

Our introduction included an overview of attribute
discretization
, a capability in Analysis
Services
that allows us to group members of an attribute into
a number of member groups. We noted potential benefits that accrue from discretization
in our UDMs, and how the process can help us to meet the primary
objectives of business intelligence. We discussed design, and other,
considerations involved in the discretization of attributes, and
touched upon best practices surrounding the use of this capability, in
providing support for the selection and delivery of enterprise data. Finally,
we looked ahead to subsequent articles of this series, where we will examine
the individual properties underlying attribute discretization,
and conduct a review of the respective settings associated with each property,
through hands-on exposure to these in a working environment.

About the Series …

This
article is a member of the series Introduction
to MSSQL Server Analysis Services
. The monthly column is designed to provide hands-on
application of the fundamentals of MS SQL Server Analysis Services (“Analysis
Services
”), with each installment progressively presenting features and
techniques designed to meet specific real-world needs. For more information on
the series, please see my initial article, Creating Our First Cube. For the software components, samples and tools
needed to complete the hands-on portions of this article, see Usage-Based Optimization in Analysis Services 2005, another article within this
series.

»


See All Articles by Columnist
William E. Pearson, III

William Pearson
William Pearson
Bill has been working with computers since before becoming a "big eight" CPA, after which he carried his growing information systems knowledge into management accounting, internal auditing, and various capacities of controllership. Bill entered the world of databases and financial systems when he became a consultant for CODA-Financials, a U.K. - based software company that hired only CPA's as application consultants to implement and maintain its integrated financial database - one of the most conceptually powerful, even in his current assessment, to have emerged. At CODA Bill deployed financial databases and business intelligence systems for many global clients. Working with SQL Server, Oracle, Sybase and Informix, and focusing on MSSQL Server, Bill created Island Technologies Inc. in 1997, and has developed a large and diverse customer base over the years since. Bill's background as a CPA, Internal Auditor and Management Accountant enable him to provide value to clients as a liaison between Accounting / Finance and Information Services. Moreover, as a Certified Information Technology Professional (CITP) - a Certified Public Accountant recognized for his or her unique ability to provide business insight by leveraging knowledge of information relationships and supporting technologies - Bill offers his clients the CPA's perspective and ability to understand the complicated business implications and risks associated with technology. From this perspective, he helps them to effectively manage information while ensuring the data's reliability, security, accessibility and relevance. Bill has implemented enterprise business intelligence systems over the years for many Fortune 500 companies, focusing his practice (since the advent of MSSQL Server 2000) upon the integrated Microsoft business intelligence solution. He leverages his years of experience with other enterprise OLAP and reporting applications (Cognos, Business Objects, Crystal, and others) in regular conversions of these once-dominant applications to the Microsoft BI stack. Bill believes it is easier to teach technical skills to people with non-technical training than vice-versa, and he constantly seeks ways to graft new technology into the Accounting and Finance arenas. Bill was awarded Microsoft SQL Server MVP in 2009. Hobbies include advanced literature studies and occasional lectures, with recent concentration upon the works of William Faulkner, Henry James, Marcel Proust, James Joyce, Honoré de Balzac, and Charles Dickens. Other long-time interests have included the exploration of generative music sourced from database architecture.

Latest Articles