Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted May 14, 2009

Attribute Discretization: Using the "Equal Areas" Method - Page 3

By William Pearson

Procedure: Employ the Equal Areas Discretization Method for Attribute Members in Analysis Services

When we browse a cube, we typically dimension the members of one attribute hierarchy by the members of another attribute hierarchy. For example, we might group customer sales by product purchased, by customer geography (state, province or country), or by customer gender. However, with certain types of attributes, it is useful to have Analysis Services automatically create groupings of attribute members, particularly when large numbers of contiguous values are involved, based upon the distribution of the members within an attribute hierarchy.

As an example, let’s assume that we have been approached by representatives of our hypothetical client, the Adventure Works organization. These representatives, members of the Sales department of the business, tell us that one of the existing attributes of the Customer dimension, Birth Date, does not serve them well in browses and reports, as the many different dates are simply listed, the members being derived from the date values in the BirthDate column of the DimCustomer table that underlies the Analysis Services layer of the business intelligence solution. (The developer of the cube, having returned to his home country, is no longer available to assist the Sales department in modifying the Birth Date presentation into a more useful format.)

We listen to the description of the problem, and then suggest grouping the Birth Date information via the pre-defined Equal Areas discretization method offered within Analysis Services. We suggest this as a starter approach, so that our client colleagues can see how grouping the values in this manner might make the data more meaningful in browsing / reporting. We tell them that, once we do this, information consumers who browse the Birth Dates attribute hierarchy will see the names / values of the groups instead of the members themselves – as if the Birth Dates are placed into folders that are automatically named for the date ranges they contain, as we shall see. This limits the number of levels that are presented to users, which can be less confusing, and more useful for analysis. (We inform the client representatives, too, that other methods of discretization can be leveraged if the Equal Areas method proves less than ideal – we are simply suggesting the Equal Areas method as a good starting point to illustrate an approach to meeting the end objective – an approach that we can easily “tweak” once our colleagues understand the general concept of discretization, and can then make intelligent choices with regard to specific nuances that they might deem desirable.

Browse the Existing, Ungrouped Members within the Attribute Hierarchy

We will begin our practice with the Equal Areas discretization method within the Birth Date attribute hierarchy of the Customer dimension.

1.  Within the Solution Explorer, right-click the Customer dimension (expand the Dimensions folder as necessary).

2.  Click Open on the context menu that appears, as shown in Illustration 3.

Opening the Dimension via the Dimension Designer
Illustration 3: Opening the Dimension via the Dimension Designer ...

The tabs of the Dimension Designer open. Let’s look at the current state of the attribute under consideration, Birth Date.

3.  Click the Browser tab.

4.  Select Birth Date within the dropdown Hierarchy selector atop the Browse tab, as depicted in Illustration 4.

Select the Birth Date Hierarchy to Browse
Illustration 4: Select the Birth Date Hierarchy to Browse ...

The attributes belonging to the Birth Date attribute hierarchy appear as partially shown in Illustration 5.

The Member Attributes, Birth Date Attribute Hierarchy of the Customer Dimension
Illustration 5: The Member Attributes, Birth Date Attribute Hierarchy of the Customer Dimension (Partial View)

We note that many contiguous values appear. It is easy to see how grouping these values into logical “buckets” might make the data easier for information consumers to analyze.

Add Equal Areas Discretization within the Attribute Hierarchy

We will enact the Equal Areas discretization method from the Dimension Structure tab of the Customer dimension.

1.  Click the Dimension Structure tab.

2.  Within the Attributes pane, select Birth Date (the top attribute hierarchy in the pane).

The Properties window for the attribute appears (by default in the bottom right corner of the design environment), as depicted in Illustration 6.

The Properties Window for the Birth Date Attribute
Illustration 6: The Properties Window for the Birth Date Attribute

3.  In the Properties window, click the setting box to the right of the DiscretizationMethod property (which currently contains the word “None”), to enable the selector on the right edge of the box.

4.  Select Equal Areas within the selector, as shown in Illustration 7.

Changing the DiscretizationMethod Property Setting to “EqualAreas”
Illustration 7: Changing the DiscretizationMethod Property Setting to “EqualAreas” ...

The DiscretizationMethod property specifies the method used to group the members of the owner attribute. Once we tell Analysis Services the method of discretization, we can tell it how many “buckets” to create for purposes of grouping the attribute members.

5.  In the setting box to the right of the DiscretizationBucketCount property (which currently contains the “0”), replace the value for the DiscretizationBucketCount property with the number 12.

The DiscretizationBucketCount property specifies the number of buckets in which to discretize members of the attribute to which the property belongs. The default setting is “0”. When we leave the DiscretizationBucketCount property at default, Analysis Services generates the number of groups independently, after sampling the underlying data involved.

The affected portion of the Properties window for the Birth Date attribute hierarchy appears as depicted (modifications surrounded by the red box) in Illustration 8.

Properties Window for the Birth Date Attribute Hierarchy, with Modifications
Illustration 8: Properties Window for the Birth Date Attribute Hierarchy, with Modifications

We are now ready to process the Analysis Services database, and then to examine the results of our handiwork with the browser, as before.



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM