This article introduces partition
planning for Analysis Services. Here, we will follow upon our
introduction to partitions in Analysis Services, and discuss the importance of planning
in partition design.
Note: For more information about my MSSQL Server Analysis
Services
column in general, see the section entitled “About the MSSQL Server Analysis
Services Series” that follows the conclusion of this article.
Introduction
In Dimensional Model Components: Dimensions Parts I and II, we undertook a general
introduction to the dimensional model, noting its wide acceptance as the
preferred structure for presenting quantitative and other organizational data
to information consumers. As a part of our extended examination of dimensions,
we discussed the primary objectives of business intelligence, including its
capacity to support:
-
the presentation
of relevant and accurate information representing business operations and
events; -
the rapid and
accurate return of query results; -
“slice and
dice” query creation and modification; -
an environment
wherein information consumers can pose questions quickly and easily, and achieve
rapid results datasets.
We noted
in Cube
Storage: Introduction that the
second objective above, the capacity of business intelligence to support “the
rapid and accurate return of query results”, translates to minimal querying
time. We discussed that storage design plays a key role in enhancing query performance
across our cubes, and learned, in Cube
Storage: Introduction to Partitions, that partitions play a
significant role in the way that Analysis Services manages and stores data and
aggregations for a measure group in a cube.
In this
article, we will continue the general exploration of cube storage that we began
in Cube Storage: Introduction, this
time focusing upon partitions, which we initially introduced in Cube Storage: Introduction to Partitions.
This article is one of several others in this subseries that explore various
concepts surrounding partitions in Analysis Services. Here we will discuss partition
planning considerations and how they are important to partition design within
our business intelligence solutions, touching upon points that include the
following:
-
A discussion
of partition planning in general in Analysis Services; -
A discussion
of each of several individual resource assignment and settings considerations
in planning Partition design within Analysis Services; -
A hands-on
examination, from within the Cube Designer of the Business Intelligence
Development Studio, where applicable, of a respective example resource assignment
/ setting from the Adventure Works sample cube, for each of the resource assignment
and settings considerations we discuss.
Planning Partitions in Analysis Services
As we
noted in Cube Storage: Introduction to
Partitions,
as well as within other articles of our series, a partition is a physical file
on a hard disk that contains a subset of the data included in an Analysis
Services database. Analysis Services uses partitions to manage and store data
and aggregations for a measure group in a cube. A measure group can have one
or more partitions, and, when large data volumes are concerned, partitions
afford us a means of scaling the Unified Dimension Model (UDM) we have designed
to support our Analysis Services database.
When
designing partitions, it is important to keep in mind that the whole objective
is to support good query and processing performance for large Analysis Services
measure groups. Partitioning can be vital when we anticipate large measure
groups in our design environment, because the boosted query performance they
support can make the difference in a mediocre solution and a solution that
makes robust and satisfying analysis and reporting available to the information
consumers of our clients and employers. The boost in performance is largely
possible because the Analysis Server is capable of selecting and querying among
the partitions that are available: that is, Analysis Services can select a
single partition to meet the demands of a given query, and return results more
rapidly because a (sometimes dramatically) smaller dataset is targeted and
physically accessed.
We
mentioned other ways partitioning can help us to manage large Analysis Services
databases in other articles of this subseries, as well. Among these, we noted
that small volumes of fact data (say that of a day or a week) can be added to a
small portion of a cube (a partition) faster than we can incrementally add the
same, small amount of data into a single, much larger partition that might
contain years of history. Planning for the presence of a relatively small partition
to contain current data can give us more options for supporting the delivery of
real-time data, in many cases.
Recall
that the primary reason for partitioning in the first place is to minimize the
amount of time required to process a cube; partitioning makes this possible, as
individual partitions can be processed independently of each other. Because we
can process partitions independently, we can process our current data, which is
subject to ongoing change, more frequently (say, to add small time frames
incrementally), while we are afforded the option to process seldom changing
historical data (say data belonging to years before the prior year) less
frequently. Moreover, parallel processing of individual partitions can mean added
efficiencies, for obvious reasons.
We will get some hands-on exposure to various partition settings
in the practice session below. Before we get started discussing various partition planning criteria,
and then taking a brief look at some sample resource assignments / settings
involved with each, we will need to prepare the local environment for the
practice session. We will take steps to accomplish this within the section
that follows.