Simplify Schemas for Faster Cube Processing
Overview and Discussion
Cube
processing performance is often a high visibility subject to our clients and
employees. There are many means and considerations in optimizing build times,
but few are as straightforward as the use of Optimize Schema, which
quickly and "automatically" makes modifications that can provide
palpable improvements, in many cases. Manual intervention can become necessary,
in some instances, to achieve the intended objectives of the option for all
dimensions, but the concepts behind the steps are identical.
The
way this all works is, simply, that the Member Key Column property,
aligned with the lowest level of a shared dimension in the cube, is
pointed to a column in the fact table instead of a column in a dimension
table. When this can be accomplished, the need for joins is eliminated in
the SQL query that MSAS executes against the relational data source to populate
the cube. As we might expect, such a circumstance imparts multiple benefits,
including reductions in query complexity, reductions in the amount of data
accessed in the relational data source, and reductions in network traffic
between the Analysis Server and the relational data source.
Several
conditions must exist for schema optimization (via Optimize Schema, or its
manual equivalent) to be effective. For any dimension under consideration, these conditions are as follows:
-
The dimension
must be a shared dimension;
-
The dimension
must have been processed at least once prior to selecting the Optimize
Schema option;
-
The lowest
dimension level must be physically represented in the cube (it can
be hidden, but not disabled);
-
The Member
Key Column property for the lowest dimension level must contain the key
that relates the fact table to the dimension table;
-
The Member
Key Column property for the lowest dimension level must be the only key that
is needed to relate the fact and dimension tables;
-
The Member
Key Column property for the lowest dimension level must be unique.
Multiple considerations can arise in altering the schema of a
cube through the Optimize Schema option. These include scenarios where
our cube design is dependent upon joins between the dimension and fact tables,
as a mechanism for forcing the exclusion of rows of the fact table from the
cube that is to be generated. As is no doubt apparent to most of us, removal
of all joins through Optimize Schema, or any other process will mean a
corresponding removal of the filtering mechanism, and the resulting selection
of all rows in the fact table.
Moreover, it is important to keep in mind that the Optimize
Schema process, in its elimination of existing joins, can alter our
selection of available tables in subsequent attempts to specify drillthrough
options. This, like all the other considerations, needs to be evaluated as
a part of planning to use the Optimize Schema option, or to pursue other
steps to replicate its action, to enhance cube processing performance.
NOTE: When manually optimizing a
cube schema, it is important to remember that no warning or error dialogs will
appear if the incorrect column is selected in the respective Member Key
Column property. If incorrect choices are made, incorrect values may be
returned, or cube processing may fail altogether. If the optimized dimension
is used as a slicer within a cube partitioning strategy, processing failures
are likely, as well.
Within the "manual"
scenario, consideration needs to be given to examining several variables before
attempting to perform schema optimization. Consult the Microsoft
Developer Network (MSDN), the Books Online, and other
reliable documentation before attempting this on any but development systems -
and with a backup in place before beginning in every case.
To confirm our
understanding of the steps we will undertake, together with the impact of these
steps, we will perform "before and after" examinations of the cube of
a hypothetical client, particularly focusing on the settings of the Member Key Column property at the lowest level that
we have discussed. The changes that occur to this property as a result of our Optimize
Schema efforts will make the process more transparent. For purposes of our practice session, we will create
a copy of the targeted cube, as we have in various articles within our Database
Journal series'. We will then process the clone cube to "register"
it with Analysis Services, before beginning our examination of Optimize
Schema within our practice exercise.
Considerations and Comments
For purposes of this
exercise, we will be working with the Warehouse cube, within the FoodMart
2000 MSAS database; these working samples accompany a typical installation
of MSAS. If the samples are not installed in, or have been removed from, your
environment, they can be obtained from the installation CD, from the Analysis
Services section of the Microsoft website, and perhaps elsewhere. If you
prefer not to alter the structure of your sample cubes, as they currently
exist, make copies of the cube we reference in the article before beginning the
practice exercises. For instructions on copying cubes, see the Preparation
section of Introduction
to MSSQL Server 2000 Analysis Services: Semi-Additive Measures and Periodic
Balances.