Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted Mar 13, 2009

Introduction to Attribute Discretization

By William Pearson

This article introduces attribute discretization in Analysis Services, and extends the examination of the dimensional model that we began in Dimensional Model Components: Dimensions Parts I and II. Here, we will introduce attribute discretization, focusing upon the purpose and benefits of this capability, which, as we shall see, affords us a means of creating a manageable number of groups of attribute values that are separated by distinct boundaries. In this way, discretization (or “bucketization”) allows us to group contiguous values into sets of discrete values.

Note: For more information about my MSSQL Server Analysis Services column in general, see the section entitled “About the MSSQL Server Analysis Services Series” that follows the conclusion of this article.

Introduction

In Dimensional Model Components: Dimensions Parts I and II, we undertook a general introduction to the dimensional model, noting its wide acceptance as the preferred structure for presenting quantitative and other organizational data to information consumers. We then began our examination of dimensions, the analytical “perspectives” upon which the dimensional model relies in meeting the primary objectives of business intelligence, including its capacity to support:

  • the presentation of relevant and accurate information representing business operations and events;
  • the rapid and accurate return of query results;
  • “slice and dice” query creation and modification;
  • an environment wherein information consumers can pose questions quickly and easily, and achieve rapid results datasets.

To extend the metaphor we used earlier in describing dimensions as nouns and measures as verbs, we might consider attributes as somewhat similar to adjectives. That is, attributes help us to define with specificity what dimensions cannot define by themselves. Dimensions alone are like lines in geometry: they don't define “area” within multidimensional space, nor do they themselves even define the hierarchies that they contain. A database dimension is a collection of related objects called attributes, which we use to specify the coordinates required to define cube space.

Dimensions and dimension attributes should support the way that management and information consumers of a given organization describe the events and results of the business operations of the entity. Because we maintain dimension and related attribute information within the database underlying our Analysis Services implementation, we can support business intelligence for our clients and employers even when these details are not captured within the system where transaction processing takes place. Within the analysis and reporting capabilities we supply in this manner, dimensions and attributes are useful for aggregation, filtering, labeling, and other purposes.

Having covered the general characteristics and purposes of attributes in Dimensional Attributes: Introduction and Overview Parts I through V, we fixed our focus upon the properties underlying them, based upon the examination of a representative attribute within our sample cube. We then continued our extended examination of attributes to yet another important component we had touched upon earlier, the attribute member Key, with which we gained some hands-on exposure in practice sessions that followed our coverage of the concepts. In Attribute Member Keys – Pt I: Introduction and Simple Keys and Attribute Member Keys – Pt II: Composite Keys, we introduced attribute member Keys in detail, continuing our recent group of articles focusing upon dimensional model components, with an objective of discussing the associated concepts, and of providing hands-on exposure to the properties supporting them.

In Attribute Member Names, we examined the attribute member Name property, which we had briefly introduced in Dimensional Attributes: Introduction and Overview Part V. We examined the details of the attribute member Name, and shed some light on how they might most appropriately be used without degrading system performance or creating other unexpected or undesirable results. Next, we examined the “sister” attribute member Value property (which we introduced along with attribute member Name in Dimensional Attributes: Introduction and Overview Part V) in Attribute Member Values in Analysis Services. As we did in our overview of attribute member Name, we examined the details of Value. Our concentration was also similarly upon its appropriate use in providing support for the selection and delivery of enterprise data in a more focused and consumer-friendly manner, without the unwanted effects of system performance degradation, and other unexpected or undesirable results, that can accompany the uninformed use of the property.

Finally, in Introduction to Attribute Relationships in MSSQL Server Analysis Services, we examined another part of the conceptual model, Attribute Relationships. In this introduction, we discussed several best practices and design, and other considerations involved in their use, with a focus upon the general exploitation of attribute relationships in providing support for the selection and delivery of enterprise data. In the subsequent two related articles, Attribute Relationships: Settings and Properties and More Exposure to Settings and Properties in Analysis Services Attribute Relationships, we examined attribute relationships in a manner similar to previous articles within this subseries, concentrating in detail upon the properties that underlay them.

In this article, we will introduce a capability in Analysis Services – to which we refer as attribute discretization - that allows us to group members of an attribute into a number of member groups. We will discuss design, and other, considerations involved in the discretization of attributes, and touch upon best practices surrounding the use of this capability. Our focus will be upon the general exploitation of discretization in providing support for the selection and delivery of enterprise data. (In other articles designed specifically for the purpose, we will examine attribute discretization in a manner similar to previous articles within this subseries, gaining hand-on exposure to the use of discretization in a practice scenario.)

Our examination will include:

  • An overview of attribute discretization in Analysis Services, potential benefits that accrue from discretization in our UDMs, and how the process can help us to meet the primary objectives of business intelligence.
  • A discussion of general considerations and best practices surrounding attribute discretization.
  • A look forward to subsequent articles that within our series, where we will perform detailed examinations of the properties underlying attribute discretization, along with a review of the respective settings associated with each property, based upon a representative dimension attribute within our sample UDM.

Attribute Discretization

As we have learned, attributes serve as the foundation for our dimensions and cubes. Whenever we work with attributes, we can expect to encounter two general types of values:

  • Discrete attributes: Discrete values stand apart distinctly, and have clearly defined logical “boundaries” between themselves. The possible values are naturally discrete for the lion’s share of attributes occurring in the business world.

    Example: The Gender attribute, within the Customer dimension of the Adventure Works sample UDM, is (at least for purposes of the sample cube) considered to have only one of two discrete values, female or male.

  • Contiguous attributes: Contiguous values do not stand apart distinctly, but flow along, as if in a continuous line. Contiguous values, especially within large populations, can have very large numbers of possible values. Information consumers can find it difficult to work effectively and efficiently within such wide ranges of values.

    Example: the Vacation Hours attribute, within the Employee dimension of the Adventure Works sample UDM, can have a wide range of possible values, depending upon how many employees are involved, whether there are limits on how many vacation days they can accumulate, and considerations of this nature. The member values are based directly upon the unique values contained within the VacationHours column of the DimEmployee table (with many of the unique values shared among multiple employees). The sheer number of values might make working with them cumbersome for information consumers, if they are simply made available in their existing state.

Overview

Discretization can help us to make it easier for information consumers to work with large numbers of possible attribute member values. As we have learned, discretization is the process of creating a manageable number of groups of attribute values that are clearly separated by boundaries. We can thereby use discretization as a means of group contiguous values into sets of discrete values, via a system-generated collection of consecutive dimension members known as member groups.

The discretization process groups the attribute members into the member groups, once they are generated; the member groups are then housed within a level within the dimensional hierarchy. (A given level within a dimensional hierarchy can contain either members or member groups, but not both.) When information consumers browse a level that contains member groups, they see the names and cell values of the member groups. The members generated by Analysis Services to support member groups are called grouping members, and they look like ordinary members.

Analysis Services affords us several variations of attribute discretization, based upon algorithms of varying complexity. The different methods of discretization all have the same function – to group contiguous values into sets of discrete values. They simply manage grouping via different approaches. Beyond the Dimension Editor, Analysis Services also supports user-defined discretization, via data definition language (DDL), should the “out – of – the – box” approaches not meet the business needs of our local environments. Moreover, we can alternatively implement custom discretization via the underlying data warehouse, using views at the relational level; named calculations in the data source view; calculated members in Analysis Services; or via other approaches.

As we have noted throughout my MSSQL Server Analysis Services series, as well as throughout my other Database Journal series’, one of the most important objectives in building a high performance Analysis Services solution is an efficient and effective dimension design. The identification of opportunities where we can effectively use attribute discretization, and the effective design and placement of the member groups that we generate thereby, can mean the provision of a much more consumer-friendly interface for our clients and employers.

Best Practices and Other Considerations Surrounding Attribute Discretization

Best practices dictate that, in creating attribute / member groups, we give those groups names that are intuitive for information consumers – names that best represent the semantics of the business. Analysis Services provides a naming template that we can use – or modify and use as a custom template or templates - to make this easier for us. Member group names are generated automatically, via the template, when the member groups are created. Unless we specify a naming template, the default naming template is used. (We can change this method of naming by specifying a naming template in the Format option for the NameColumn property of an attribute). Different naming templates can be defined for every language specified in the Translations collection of the column binding that has been used for the NameColumn property of the attribute.

Another best practice is to make the sort order of members meet the requirements of information consumers. Sort order is controlled via the OrderBy property of the attribute. Based on this sort order, the members in a member group are ordered consecutively. Finally, another best practice, from the perspective of “consumer-friendly” member group generation, is the consideration of intuitive, logical drill down paths. For example, one common use for member groups is to support drill down from a level with few members to a level with many members. (We will get some hands-on exposure to providing this sort of support in subsequent articles of this series).

We need to keep in mind that, when we process a dimension within Analysis Services, a discretized attribute is rediscretized only with a full update (ProcessFull). To rediscretize an attribute, we must perform a full update of the dimension – processing via an incremental update (ProcessAdd), will not rediscretize a discretized attribute, meaning that the names and children within the new “buckets” remain the same.

NOTE: We will examine in detail the properties (and their settings) that support attribute discretization within the sample Adventure Works cube in the hands-on practice sections of subsequent articles in this series.

Other significant considerations in working with attribute discretization include usage limitations. First, we need to be aware that member groups (and, therefore, attribute discretization) are not supported for dimensions that use the ROLAP storage mode. Moreover, we cannot create member groups in the topmost or bottommost level of a hierarchy. (We can approach the need to accomplish this by simply adding a level in such a way that the level in which we want to create member groups is no longer the top or bottom level. We can hide the added level by setting its Visible property to False, as appropriate). An additional consideration is that we cannot create member groups in two consecutive levels of a hierarchy.

Finally, we need to keep in mind that, if the dimension table of a dimension that contains member groups is updated, and the dimension is subsequently fully processed, a new set of member groups is generated. Under such a set of circumstances, the names and children of the new member groups may be different from the old member groups.

We will examine the many properties, and their settings, that we use in performing and maintaining attribute discretization in subsequent articles of this column, where we will gain hands-on exposure to these in a working environment.

Conclusion

In this article we introduced attribute discretization in Analysis Services, extending the examination of the dimensional model that we began in Dimensional Model Components: Dimensions Parts I and II. In a manner similar to previous articles within this subseries, we overviewed the general concepts involved and looked ahead to subsequent articles, where we will perform hands-on, detailed examination of the properties that support attribute discretization within a working sample environment. Our focus, as we stated, was upon the appropriate use of attribute discretization in providing support for the selection and delivery of enterprise data in a more focused and consumer-friendly manner.

Our introduction included an overview of attribute discretization, a capability in Analysis Services that allows us to group members of an attribute into a number of member groups. We noted potential benefits that accrue from discretization in our UDMs, and how the process can help us to meet the primary objectives of business intelligence. We discussed design, and other, considerations involved in the discretization of attributes, and touched upon best practices surrounding the use of this capability, in providing support for the selection and delivery of enterprise data. Finally, we looked ahead to subsequent articles of this series, where we will examine the individual properties underlying attribute discretization, and conduct a review of the respective settings associated with each property, through hands-on exposure to these in a working environment.

About the Series ...

This article is a member of the series Introduction to MSSQL Server Analysis Services. The monthly column is designed to provide hands-on application of the fundamentals of MS SQL Server Analysis Services (“Analysis Services”), with each installment progressively presenting features and techniques designed to meet specific real-world needs. For more information on the series, please see my initial article, Creating Our First Cube. For the software components, samples and tools needed to complete the hands-on portions of this article, see Usage-Based Optimization in Analysis Services 2005, another article within this series.

» See All Articles by Columnist William E. Pearson, III

Introduction to MSSQL Server Analysis Services Series
Introduction to Security in Analysis Services
Cube Storage: Planning Partitions from a SQL Server Management Studio Perspective
Cube Storage: Planning Partitions (Business Intelligence Development Studio Perspective)
Cube Storage: Introduction to Partitions
Introduction to Cube Storage
Attribute Discretization: Customize Grouping Names
Attribute Discretization: Using the "Clusters" Method
Attribute Discretization: Using the "Equal Areas" Method
Attribute Discretization: Using the Automatic Method
Introduction to Attribute Discretization
More Exposure to Settings and Properties in Analysis Services Attribute Relationships
Attribute Relationships: Settings and Properties
Introduction to Attribute Relationships in MSSQL Server Analysis Services
Attribute Member Values in Analysis Services
MSSQL Analysis Services - Attribute Member Names
Attribute Member Keys - Pt II: Composite Keys
Attribute Member Keys - Pt 1: Introduction and Simple Keys
Dimension Attributes: Introduction and Overview, Part V
Dimension Attributes: Introduction and Overview, Part IV
Dimension Attributes: Introduction and Overview, Part III
Dimension Attributes: Introduction and Overview, Part II
Dimension Attributes: Introduction and Overview, Part I
Dimensional Model Components: Dimensions Part II
Dimensional Model Components: Dimensions Part I
Manage Unknown Members in Analysis Services 2005, Part II
Manage Unknown Members in Analysis Services 2005, Part I
Alternatively Sorting Attribute Members in Analysis Services 2005
Introduction to Linked Objects in Analysis Services 2005
Distinct Counts in Analysis Services 2005
Positing the Intelligence: Conditional Formatting in the Analysis Services Layer
Administration and Optimization: SQL Server Profiler for Analysis Services Queries
Mastering Enterprise BI: Time Intelligence Pt. II
Mastering Enterprise BI: Time Intelligence Pt. I
Design and Documentation: Introducing the Visio 2007 PivotDiagram
Actions in Analysis Services 2005: The URL Action
Actions in Analysis Services 2005: The Drillthrough Action
Mastering Enterprise BI: Introducing Actions in Analysis Services 2005
Mastering Enterprise BI: Introduction to Translations
Mastering Enterprise BI: Introduction to Perspectives
Introduction to the Analysis Services 2005 Query Log
Mastering Enterprise BI: Working with Measure Groups
Mastering Enterprise BI: Introduction to Key Performance Indicators
Mastering Enterprise BI: Extend the Data Source with Named Calculations, Pt. II
Mastering Enterprise BI: Extend the Data Source with Named Calculations, Pt. I
Process Analysis Services Objects with Integration Services
Usage-Based Optimization in Analysis Services 2005
Introduction to MSSQL Server Analysis Services: Named Sets Revisited
Introduction to MSSQL Server Analysis Services: Migrating an Analysis Services 2000 Database to Analysis Services 2005
Introduction to MSSQL Server Analysis Services: Introducing Data Source Views
Introduction to MSSQL Server Analysis Services: Reporting Options for Analysis Services Cubes: MS Excel 2003 and More ...
Introduction to MSSQL Server Analysis Services: Mastering Enterprise BI: Create Aging "Buckets" in a Cube
Introduction to MSSQL Server Analysis Services: Mastering Enterprise BI: Relative Time Periods in an Analysis Services Cube, Part II
Introduction to MSSQL Server Analysis Services: Mastering Enterprise BI: Relative Time Periods in an Analysis Services Cube
Introduction to MSSQL Server Analysis Services: Process Analysis Services Cubes with DTS
Introduction to MSSQL Server Analysis Services: Presentation Nuances: CrossTab View - Same Dimension
Introduction to MSSQL Server Analysis Services: Point-and-Click Cube Schema Simplification
Introduction to MSSQL Server 2000 Analysis Services: Manage Distinct Count with a Virtual Cube
Introduction to MSSQL Server 2000 Analysis Services: Distinct Count Basics: Two Perspectives
Introduction to MSSQL Server 2000 Analysis Services: Semi-Additive Measures and Periodic Balances
Introduction to MSSQL Server 2000 Analysis Services: Performing Incremental Cube Updates - An Introduction
Introduction to MSSQL Server 2000 Analysis Services: Partitioning a Cube in Analysis Services - An Introduction
Introduction to MSSQL Server 2000 Analysis Services: Basic Storage Design
Introduction to MSSQL Server 2000 Analysis Services: Derived Measures vs. Calculated Measures
Introduction to MSSQL Server 2000 Analysis Services: Creating a Dynamic Default Member
Introduction to MSSQL Server 2000 Analysis Services: Another Approach to Local Cube Design and Creation
Introduction to MSSQL Server 2000 Analysis Services: Introduction to Local Cubes
Introduction to MSSQL Server 2000 Analysis Services: Actions in Virtual Cubes
Introduction to MSSQL Server 2000 Analysis Services: Putting Actions to Work in Regular Cubes
Introduction to MSSQL Server 2000 Analysis Services: Reporting Options for Analysis Services Cubes: ProClarity Part II
Introduction to MSSQL Server 2000 Analysis Services: Reporting Options for Analysis Services Cubes: ProClarity Professional, Part I
Introduction to MSSQL Server 2000 Analysis Services: Using Calculated Cells in Analysis Services , Part II
Introduction to MSSQL Server 2000 Analysis Services: Using Calculated Cells in Analysis Services, Part I
Introduction to MSSQL Server 2000 Analysis Services: MSAS Administration and Optimization: Toward More Sophisticated Analysis
Introduction to MSSQL Server 2000 Analysis Services: MSAS Administration and Optimization: Simple Cube Usage Analysis
Introduction to MSSQL Server 2000 Analysis Services: Build a Web Site Traffic Analysis Cube: Part II
Build a Web Site Traffic Analysis Cube: Part I
Reporting Options for Analysis Services Cubes: Cognos PowerPlay
Reporting Options for Analysis Services Cubes: MS FrontPage 2002
Reporting Options for Analysis Services Cubes: MS Excel 2002
Introduction to MSSQL Server 2000 Analysis Services: Drilling Through to Details: From Two Perspectives
Introduction to MSSQL Server 2000 Analysis Services: Custom Cubes: Financial Reporting - Part II
Introduction to MSSQL Server 2000 Analysis Services Custom Cubes: Financial Reporting (Part I)
Introduction to SQL Server 2000 Analysis Services: Exploring Virtual Cubes
Introduction to SQL Server 2000 Analysis Services: Working with the Cube Editor
Introduction to SQL Server 2000 Analysis Services: Parent-Child Dimensions
Introduction to SQL Server 2000 Analysis Services: Handling Time Dimensions
Introduction to SQL Server 2000 Analysis Services: Working with Dimensions
Introduction to SQL Server 2000 Analysis Services: Creating Our First Cube



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM