Introduction to Attribute Relationships in MSSQL Server Analysis Services - Page 2
December 23, 2008
As we have learned, attributes serve as the foundation for our dimensions and cubes. Each dimension contains a set of attributes, the organization of which is based upon attribute relationships. We create attribute relationships as a part of creating a given dimension. For each table that we include within a dimensional structure, an attribute relationship is established to associate the key attribute to other attributes from the table.
Multiple advantages accrue from the knowledgeable use of attribute relationships, including:
One of the most important objectives in building a high performance Analysis Services solution is an efficient and effective dimension design. The identification of attribute relationships, and the effective use of hierarchies (which we examine in other articles of this subseries) represent the two most important considerations underlying the optimization of dimension design.
An Analysis Services dimension typically has, as an underlying source, a relational data warehouse dimension table. And within relational data warehouses, each dimension usually contains a primary key, attributes, and, in some instances, foreign key relationships to other tables. The attributes provide descriptive context to the primary key and, from a relational perspective, each attribute has either a many-to-one relationship with the primary key (this is, by far, most frequently the case) or a one-to-one relationship with the primary key. The attribute relationships existing within the Product dimension of the Adventure Works DW sample database (available to anyone installing Analysis Services) appear as depicted in Illustration 1.
Some attributes also have relationships with other attributes. For example, in the Product dimension we have cited, the Subcategory attribute has a many-to-one relationship with the Category attribute, as shown in Illustration 2.
A grasp of the concept of relationships within Analysis Services is as important to the OLAP developer / administrator as an understanding of the dependencies among fields in relational databases are to their own respective developers and administrators. To allow it to create useful aggregations, to accurately and completely aggregate data, and to efficiently store and retrieve information, the relationships between attributes must make sense to Analysis Services. This is the purpose of attribute relationships: to define and describe the relationship between a given pair of attributes. To optimize our dimension designs, we must first understand how our attributes are related to each other, and then we must take steps to let Analysis Services know, via attribute relationships, what those relationships are.
When we initially create a dimension and add attributes, Analysis Services automatically defines the relationship between the primary key attribute and all other attributes as many - to one. While this default scenario is not inaccurate, on the surface, it is likely to be suboptimal from the perspective of Analysis Services performance, as it requires the server to perform extra work to determine that some combinations do not exist, among other activities. With the default scenario in place, anytime a query to the data source includes an attribute within the dimension, Analysis Services will summarize data from the primary key and then group it by the attribute under consideration. This means that if we choose to present our data by a given attribute at one point in time, individual member keys are grouped on the fly by the same attribute. If we next request summarized data for another attribute, then the entire on-the-fly grouping is performed once again for the attribute requested. Unsurprisingly, inefficiencies can mount in cases where the totals of a given attribute might be easily derived from the totals of another attribute.
An example would be a case where we request a total for a couple of attributes (such as City and State-Province within the Customer dimension of the Adventure Works sample cube) where City rolls up into the State-Province attribute of the same dimension: leaving the attribute relationships at default of many - to one would mean that Analysis Services would summarize data from the primary key (CustomerKey in the underlying DimCustomer table), and then perform an onthefly grouping by the City attribute. To deliver the second total for State-Province, Analysis Services will perform a separate, onthefly grouping for the State-Province attribute. The processing inefficiencies are obvious, since each State-Provinces total can be derived (more quickly) from the totals of the Cities underlying it. (The same scenario applies to the rollup relationship between the State-Province and Country attributes.)
We can see the attribute relationships within the Customer dimension that we have discussed in our geographical attribute examples, appearing as depicted in Illustration 3.
The default design has other inherent inefficiencies. Since Analysis Services cannot identify which attribute combinations naturally exist in the dimension, the server must use the fact data to identify meaningful member combinations, as we intimated earlier. As an illustration, at query time, if an information consumer requests data by Country and State-Province, Analysis Services will be forced to perform additional processing to ascertain, for example, that the combination of Country: France and State-Province: Arizona does not exist.
We can turn to the browser for the Customer dimension within the Adventure Works cube to see the structure underlying the aforementioned example attributes in our Customer geography example, as shown in Illustration 4.
Here we can see that the Analysis Server has used the attribute relationships depicted in Illustration 3 to align the appropriate State-Provinces to their respective Countries, and therefore to make the example we have offered, Country: France and State-Province: Arizona, a physically invalid combination.
Even though, in the example above, Customer Key is no longer directly related to Country and State-Province, it remains indirectly related to these attributes through chained attribute relationships. More specifically, Customer Key is related to Country using the chain of attribute relationships that link Customer Key to Postal Code, Postal Code to City, City to State-Province, and State-Province to Country. Such a chain of attribute relationships is also known as a cascading attribute relationship.
Cascading attribute relationships enable better performance decisions within Analysis Services with regard to several factors, including (as we have intimated earlier) aggregation design, data storage, data retrieval, and MDX calculations. Moreover, attribute relationships also assist us in enforcing dimension security and in associating measure group data with non-primary key granularity attributes. (For example, if we have a measure group that contains Internet and other - Sales data by Customer Key and forecast data by State-Province, the forecast measure group will only know how to roll up data from State-Province to Country if attribute relationships exist only between State-Province and Country.)
As we have noted in Attribute Member Keys Pt I: Introduction and Simple Keys and Attribute Member Keys Pt II: Composite Keys, among other articles of this series, the KeyColumns property consists of a source column or combination of source columns (known as a collection) that uniquely identifies the members for a given attribute. Once we define relationships among our attributes, the importance of the KeyColumns property becomes even more prominent. We must ensure that the KeyColumns property of each attribute in the dimension uniquely identifies the respective attribute member. If the KeyColumns property does not uniquely identify each member, duplicates encountered during processing (which are ignored by default), result in incorrect data rollups.
It is important to be aware, also, that if the attribute relationship has a default type of Flexible, Analysis Services will not notify us when it encounters duplicates, and will assign all of the duplicate members incorrectly. (We will examine the relationship types further within the hands-on practice section that appears in our next article.)
As we define new attribute relationships, one important consideration we should keep in mind, from the tandem perspectives of performance and data correctness, is the removal of any redundant relationships. As an example, when the relationships shown in Illustration 3 were established, the Customer Key no longer required the (default) direct relationship to City or State-Province. These attribute relationships would have best been removed at this point.
NOTE: To help us to identify redundant attribute relationships, the Business Intelligence Development Studio provides a warning message, underneath the three panes of the Dimension Structure tab of the dimension, to alert us to any existing redundancies. The Studio, however, does not require us to eliminate the redundancy. It is a best practice to always manually remove each redundant relationship. (Once we have removed the redundancy, the warning disappears.)