Attribute Relationships
As we
have learned, attributes serve as the foundation for our dimensions
and cubes. Each dimension contains a set of attributes,
the organization of which is based upon attribute relationships.
We create attribute relationships as a part of creating a given dimension.
For each table that we include within a dimensional structure, an attribute
relationship is established to associate the key attribute to other attributes
from the table.
Multiple
advantages accrue from the knowledgeable use of attribute relationships,
including:
-
Enhancements
in dimension, partition, and query processing, based upon
a reduction in the total amount of memory required for dimension processing.
-
Enhanced query
performance due to more rapid storage access, as well as more
optimized execution plans within Analysis Services.
-
Enhanced
performance because of the selection of more effective aggregates via
the Analysis Server aggregation design algorithms (assuming that user-defined
hierarchies have been defined along relationship paths).
One of
the most important objectives in building a high performance Analysis
Services solution is an efficient and effective dimension design.
The identification of attribute relationships, and the effective use of hierarchies
(which we examine in other articles of this subseries) represent the two most
important considerations underlying the optimization of dimension
design.
Overview
An Analysis Services dimension typically
has, as an underlying source, a relational data warehouse dimension table. And
within relational data warehouses, each dimension usually contains a primary
key, attributes, and, in some instances, foreign key
relationships to other tables. The attributes provide descriptive
context to the primary key and, from a relational perspective, each attribute
has either a many-to-one relationship with the primary key (this
is, by far, most frequently the case) or a one-to-one relationship with the
primary key. The attribute relationships existing within the
Product dimension of the Adventure Works DW sample database
(available to anyone installing Analysis Services) appear as depicted in
Illustration 1.
Illustration 1:
Attribute Relationships: Product Dimension Primary Key
Some attributes also have relationships
with other attributes. For example, in the Product dimension
we have cited, the Subcategory attribute has a many-to-one relationship
with the Category attribute, as shown in Illustration 2.
Illustration 2:
Attribute Relationships for Subcategory in Adventure Works DW
A grasp of the concept of relationships
within Analysis Services is as important to the OLAP developer /
administrator as an understanding of the dependencies among fields in
relational databases are to their own respective developers and
administrators. To allow it to create useful aggregations, to
accurately and completely aggregate data, and to efficiently store and retrieve
information, the relationships between attributes must make sense to Analysis
Services. This is the purpose of attribute relationships: to
define and describe the relationship between a given pair of attributes.
To optimize our dimension designs, we must first understand how our attributes
are related to each other, and then we must take steps to let Analysis
Services know, via attribute relationships, what those relationships
are.
When we initially create a dimension
and add attributes, Analysis Services automatically defines
the relationship between the primary key attribute and all other attributes
as many - to one. While this default scenario is not inaccurate, on
the surface, it is likely to be suboptimal from the perspective of Analysis
Services performance, as it requires the server to perform extra work to
determine that some combinations do not exist, among other activities. With
the default scenario in place, anytime a query to the data source includes an attribute
within the dimension, Analysis Services will summarize data from
the primary key and then group it by the attribute under
consideration. This means that if we choose to present our data by a given attribute
at one point in time, individual member keys are grouped on the fly by the
same attribute. If we next request summarized data for another attribute,
then the entire on-the-fly grouping is performed once again for the attribute
requested. Unsurprisingly, inefficiencies can mount in cases where the totals
of a given attribute might be easily derived from the totals of another attribute.
An example would be a case where we
request a total for a couple of attributes (such as City and State-Province
within the Customer dimension of the Adventure Works sample cube)
where City rolls up into the State-Province attribute of the
same dimension: leaving the attribute relationships at default of many
- to one would mean that Analysis Services would summarize data from
the primary key (CustomerKey in the underlying DimCustomer
table), and then perform an onthefly grouping by the City
attribute. To deliver the second total for State-Province, Analysis
Services will perform a separate, onthefly grouping for the State-Province
attribute. The processing inefficiencies are obvious, since each State-Provinces total can be derived (more quickly) from the totals of
the Cities underlying it. (The same scenario applies to the rollup
relationship between the State-Province and Country attributes.)
We can see the attribute
relationships within the Customer dimension that we have
discussed in our geographical attribute examples, appearing as depicted in Illustration
3.
Illustration 3: Some Attribute
Relationships for Geographical Attributes within the Customer Dimension in
Adventure Works DW
The default design has other
inherent inefficiencies. Since Analysis Services cannot identify which attribute
combinations naturally exist in the dimension, the server must use the fact
data to identify meaningful member combinations, as we intimated earlier.
As an illustration, at query time, if an information consumer requests data by Country
and State-Province, Analysis Services will be forced to
perform additional processing to ascertain, for example, that the combination
of Country: France and State-Province: Arizona does not exist.
We can turn to the browser for the
Customer dimension within the Adventure Works cube to see the
structure underlying the aforementioned example attributes in our Customer
geography example, as shown in Illustration 4.
Illustration 4: Using
the Browser for the Customer Dimension to See the Results of the Underlying
Attribute Relationships
Here we can see that the Analysis
Server has used the attribute relationships depicted in Illustration
3 to align the appropriate State-Provinces to their
respective Countries, and therefore to make the example we have offered,
Country: France and State-Province: Arizona, a physically invalid
combination.
Even though, in the example above,
Customer Key is no longer directly related to Country and State-Province,
it remains indirectly related to these attributes through chained
attribute relationships. More specifically, Customer Key is
related to Country using the chain of attribute relationships that
link Customer Key to Postal Code, Postal Code to City,
City to State-Province, and State-Province to Country.
Such a chain of attribute relationships is also known as a cascading
attribute relationship.
Cascading attribute
relationships enable better
performance decisions within Analysis Services with regard to several
factors, including (as we have intimated earlier) aggregation design, data
storage, data retrieval, and MDX calculations. Moreover,
attribute relationships also assist us in enforcing dimension
security and in associating measure group data with non-primary key
granularity attributes. (For example, if we have a measure group
that contains Internet and other - Sales data by Customer
Key and forecast data by State-Province, the forecast measure
group will only know how to roll up data from State-Province to Country
if attribute relationships exist only between State-Province and Country.)
As we have noted in Attribute Member Keys Pt I: Introduction and
Simple Keys and Attribute Member Keys Pt II: Composite Keys,
among other articles of this series, the KeyColumns
property consists of a source column or combination of source columns
(known as a collection) that uniquely identifies the members for a given
attribute. Once we define relationships among our
attributes, the importance of the KeyColumns property becomes even more
prominent. We must ensure that the KeyColumns property of each attribute
in the dimension uniquely identifies the respective attribute
member. If the KeyColumns property does not uniquely
identify each member, duplicates encountered during processing (which
are ignored by default), result in incorrect data rollups.
It is important to be
aware, also, that if the attribute relationship has a default type
of Flexible, Analysis Services will not notify us when it
encounters duplicates, and will assign all of the duplicate members
incorrectly. (We will examine the relationship types further within the
hands-on practice section that appears in our next article.)
As we define new attribute relationships,
one important consideration we should keep in mind, from the tandem
perspectives of performance and data correctness, is the removal of any redundant
relationships. As an example, when the relationships shown in Illustration
3 were established, the Customer Key no longer required the
(default) direct relationship to City or State-Province. These attribute
relationships would have best been removed at this point.
NOTE: To help us to identify redundant attribute
relationships, the Business Intelligence Development Studio provides
a warning message, underneath the three panes of the Dimension Structure
tab of the dimension, to alert us to any existing redundancies. The Studio,
however, does not require us to eliminate the redundancy. It is a best
practice to always manually remove each redundant relationship. (Once we
have removed the redundancy, the warning disappears.)