Attribute Discretization: Customize Grouping Names
July 17, 2009
This article continues my exploration of attribute discretization, a capability in Analysis Services that allows us to group members of an attribute into a number of member groups. Our concentration here will be to go beyond the individual examinations of the methods themselves that we undertook in other articles, and to get some exposure to custom naming of the groups created by the discretization process we choose. The practice we undertake to this end will, once again, surround a representative dimension attribute within our sample UDM.
This article continues the overview of Attribute Discretization in Analysis Services we began in Introduction to Attribute Discretization, and continued in Attribute Discretization: Using the Automatic Method, Attribute Discretization: Using the Equal Areas Method and Attribute Discretization: Using the Clusters Method. Both this article and its Discretization-related predecessors extend the examination of the dimensional model begun in Dimensional Model Components: Dimensions Parts I and II, and continued through Dimensional Attributes: Introduction and Overview Parts I through V, as well as our focus upon the properties underlying attributes, extending our overview into attribute member Keys, Names, Values and Relationships within several subsequent articles.
Note: For more information about my Introduction to MSSQL Server Analysis Services column in general, see the section entitled About the MSSQL Server Analysis Services Series that follows the conclusion of this article.
In Introduction to Attribute Discretization, Attribute Discretization: Using the Automatic Method, Attribute Discretization: Using the Equal Areas Method and Attribute Discretization: Using the Clusters Method, I summarized preceding articles within the current subseries, consisting of a general introduction to the dimensional model. I noted the wide acceptance of the dimensional model as the preferred structure for presenting quantitative and other organizational data to information consumers. The articles of the series then undertook an examination of dimensions, and, subsequently, dimension attributes, the analytical perspectives and structures upon which the dimensional model relies in meeting the primary objectives of business intelligence, including its capacity to support:
With Introduction to Attribute Discretization, we introduced a capability in Analysis Services to which we refer as attribute discretization - that allows us to group members of an attribute into a number of member groups. We discussed design and other considerations involved in the discretization of attributes, and touched upon best practices surrounding the use of this capability. In Attribute Discretization: Using the Automatic Method, we introduced the first of multiple pre-defined discretization methods supported within the Analysis Services UDM. We discussed the options that are available, focusing upon the employment of the Automatic discretization method within the sample cube, to meet the business requirements of a hypothetical client. We then began our practice session with an inspection of the contiguous members of a select attribute hierarchy, noting the absence of grouping and discussing shortcomings of this default arrangement. Next, we enabled the Automatic discretization method within the dimension attribute Properties pane, and then reprocessed the sample cube with which we were working to enact the new Automatic discretization of the select attribute members. Finally, we performed further inspections of the members of the attribute hierarchy involved in the request for assistance by our hypothetical client, noting the new, more intuitive grouping established by the newly enacted Automatic discretization method.
Next, in Attribute Discretization: Using the Equal Areas Method, we introduced the second of the pre-defined discretization methods supported within the Analysis Services UDM. We discussed the options that are available with this particular approach, as we did in the article previous for the Automatic method, focusing upon the employment of the Equal Areas discretization method, again within the sample cube, to meet the business requirements of a hypothetical client. We then began our practice session with an inspection, via the browser in the Dimension Designer, of the contiguous members of another select attribute hierarchy, noting the absence of grouping and discussing shortcomings of this default arrangement. Next, we enabled the Equal Areas discretization method within the dimension attribute Properties pane, and again reprocessed the sample cube with which we were working to enact the new Equal Areas discretization of the select attribute members. Finally, we performed additional inspections, via the Dimension Designer and Cube Designer browsers, of the members of the attribute hierarchy involved in the request for assistance by our hypothetical client, noting the new, more intuitive grouping established by the newly enacted Equal Areas discretization method.
In last months article, Attribute Discretization: Using the Clusters Method, we introduced the third of the pre-defined discretization methods supported within the Analysis Services UDM. We again discussed the options that are available with this particular approach, as we did in the previous articles for the Automatic and Equal Areas methods, focusing this time upon the employment of the Clusters discretization method, to meet the business requirements of a hypothetical client within the sample cube. Our practice session again followed with an inspection of the pre-discretized contiguous members of another select attribute hierarchy, where we again noted the absence of grouping and discussed shortcomings of this default arrangement. We then enabled the Clusters discretization method within the dimension attribute Properties pane, and after reprocessing the sample cube with which we were working, we performed another inspection, again via the Dimension Designer and Cube Designer browsers, of the members of the affected attribute hierarchy, noting the more intuitive grouping established by the newly enacted Clusters discretization method.
In this article, we will gain some hands-on exposure to going beyond the First Group Member Last Group Member default that Analysis Services uses in creating group labels within the various discretization methods, and enacting the generation of custom labels for our groups. Our examination will include:
Attribute Discretization with Custom Group Names
In this article, we will perform attribute discretization once again, but in this practice session, we will extend hands-on exposure with discretization to include the addition of custom group labels.
For those first joining my subset of articles surrounding discretization, lets do a brief overview of Analysis Services discretization in general. (Those who have been along for the ride can, of course, skip directly to the sections below.) As we learned in Introduction to Attribute Discretization, whenever we work with attributes, we can expect to encounter two general types of values, discrete and contiguous. Discrete values stand apart distinctly, and have clearly defined logical boundaries between themselves. Citing the Gender attribute, within the Customer dimension of the Adventure Works sample UDM, wherein the attribute is considered to have only one of two discrete values, female or male, we noted that possible values are naturally discrete for the lions share of attributes occurring in the business world.
In contrast to discrete values, we noted that contiguous values do not stand apart distinctly, but flow along, as if in a continuous line. Moreover, we discussed the fact that contiguous values, especially within large populations, can have very large numbers of possible values, and that information consumers can find it difficult to work effectively and efficiently within such wide ranges of values. As an example, we cited the Vacation Hours attribute, within the Employee dimension of the Adventure Works sample UDM, an attribute which could have a wide range of possible values, depending upon how many employees are involved, whether there are limits on how many vacation days they can accumulate, and considerations of this nature. (The member values are based directly upon the values contained within the VacationHours column of the DimEmployee table - with many of the values shared among multiple employees.) The sheer number of values might make working with them cumbersome for information consumers, if they are simply made available in their existing state.
As we discussed in Introduction to Attribute Discretization, discretization can help us to make it easier for information consumers to work with large numbers of possible attribute member values. As discretization creates a manageable number of groups of attribute values that are clearly separated by boundaries, we can thereby group contiguous values into sets of discrete values, via a system-generated collection of consecutive dimension members known as member groups.
We also discussed, in Introduction to Attribute Discretization, that once the discretization process groups the attribute members into the member groups, the member groups are then housed within a level within the dimensional hierarchy. (A given level within a dimensional hierarchy can contain either members or member groups, but not both.) When information consumers browse a level that contains member groups, they see the names and cell values of the member groups. The members generated by Analysis Services to support member groups are called grouping members, and they look like ordinary members.
Analysis Services affords us flexibility in methods of attribute discretization, based upon algorithms of varying complexity. The different methods of discretization all have the same function to group contiguous values into sets of discrete values. They simply manage grouping via different approaches. Beyond the Dimension Designer, Analysis Services also supports user-defined discretization, via data definition language (DDL), should the out of the box approaches not meet the business needs of our local environments. Moreover, we can alternatively implement custom discretization via the underlying data warehouse, using views at the relational level; named calculations in the data source view; calculated members in Analysis Services; or via other approaches.
Analysis Services supports four DiscretizationMethod property settings, which include three pre-defined discretization methods:
The DiscretizationMethod property determines whether Analysis Services is to create groupings, and then determines the type of grouping that is performed. As we learned in Introduction to Attribute Discretization, Analysis Services does not perform any groupings by default (the default setting for the DiscretizationMethod property is None). When we enable Automatic grouping, as we shall see in the practice session that follows, we direct Analysis Services to automatically determine the best grouping method based upon the structure of the attribute for which we are performing discretization. We then process the affected dimension / cube and Analysis Services creates group ranges, and then distributes the total population of attribute members appropriately across those groups. As we shall see, once we specify a grouping method, we next specify the number of groups, by using the DiscretizationBucketCount property (its default value is zero).
As we have noted throughout my MSSQL Server Analysis Services column, as well as throughout my other Database Journal series, one of the most important objectives in building a high performance Analysis Services solution is an efficient and effective dimension design. The identification of opportunities where we can effectively use attribute discretization, and the effective design, placement and labeling of the member groups that we generate thereby, can mean the provision of a much more consumer-friendly interface for our clients and employers. We will gain hands - on exposure to the Automatic method of attribute discretization in the practice session that follows, primarily as a means of supporting the focus of our article, the custom naming of groups, versus the simple acceptance of the default names provided by Analysis Services, anytime we discretize attributes. We will first enable Automatic discretization, and then process the cube, which we will examine for default naming of the attribute groups created. We will then focus upon the setup for customized group labeling, reprocess the cube, and then examine the changes in the naming of our attribute groups.
Before we get started working within a sample cube clone, we will need to prepare the local environment for the practice session. We will take steps to accomplish this within the section that follows.
Preparation: Locate and Open the Sample Basic UDM Created Earlier
In Dimensional Model Components: Dimensions Part I, we created a sample basic Analysis Services database within which to perform the steps of the practice sessions we set out to undertake in the various articles of this subseries. Once we had ascertained that the new practice database appeared to be in place, and once we had renamed it to ANSYS065_Basic AS DB, we began our examination of dimension properties. We continued with our examination of attributes within the same practice environment, which we will now access (as we did within the earlier articles of this subseries)) by taking the following steps within the SQL Server Business Intelligence Development Studio.
NOTE: Please access the Analysis Services database which we prepared in Dimensional Model Components: Dimensions Part I (and have used in subsequent articles) before proceeding with this article. If you have not completed the preparation to which I refer, or if you cannot locate / access the Analysis Services database with which we worked in the referenced previous articles, please consider taking the preparation steps provided in Dimensional Model Components: Dimensions Part I before continuing, and prospectively saving the objects with which you work, so as to avoid the need to repeat the preparation process we have already undertaken for subsequent related articles within this subseries.
1. Click Start.
2. Navigate to, and click, the SQL Server Business Intelligence Development Studio, as appropriate.
We briefly see a splash page that lists the components installed on the PC, and then Visual Studio .NET 2005 opens at the Start page.
3. Close the Start page, if desired.
4. Select File -> Open from the main menu.
5. Click Analysis Services Database ... from the cascading menu, as shown in Illustration 1.
The Connect to Database dialog appears.
6. Ensuring that the Connect to existing database radio button atop the dialog is selected, type the Analysis Server name into the Server input box (also near the top of the dialog).
7. Using the selector just beneath, labeled Database, select ANSYS065_Basic AS DB, as depicted in Illustration 2.
8. Leaving other settings on the dialog at default, click OK.
SQL Server Business Intelligence Development Studio briefly reads the database from the Analysis Server, and then we see the Solution Explorer populated with the constituent database objects. Having overviewed attribute discretization in this and the previous article, we will now get some hands-on exposure to the use of Automatic attribute discretization for the members of a representative dimension attribute within our practice UDM. Our objective will be to then examine the default labeling of the newly created attribute groups, before enabling the custom labeling of the same attribute groups through the use of the naming template supplied by Analysis Services, reprocessing the cube, and re-examining the attribute groups for the custom naming we have enacted.