Cube Storage: Planning Partitions (Business Intelligence Development Studio Perspective)
November 18, 2009
This article introduces partition planning for Analysis Services. Here, we will follow upon our introduction to partitions in Analysis Services, and discuss the importance of planning in partition design.
In Dimensional Model Components: Dimensions Parts I and II, we undertook a general introduction to the dimensional model, noting its wide acceptance as the preferred structure for presenting quantitative and other organizational data to information consumers. As a part of our extended examination of dimensions, we discussed the primary objectives of business intelligence, including its capacity to support:
We noted in Cube Storage: Introduction that the second objective above, the capacity of business intelligence to support the rapid and accurate return of query results, translates to minimal querying time. We discussed that storage design plays a key role in enhancing query performance across our cubes, and learned, in Cube Storage: Introduction to Partitions, that partitions play a significant role in the way that Analysis Services manages and stores data and aggregations for a measure group in a cube.
In this article, we will continue the general exploration of cube storage that we began in Cube Storage: Introduction, this time focusing upon partitions, which we initially introduced in Cube Storage: Introduction to Partitions. This article is one of several others in this subseries that explore various concepts surrounding partitions in Analysis Services. Here we will discuss partition planning considerations and how they are important to partition design within our business intelligence solutions, touching upon points that include the following:
Planning Partitions in Analysis Services
As we noted in Cube Storage: Introduction to Partitions, as well as within other articles of our series, a partition is a physical file on a hard disk that contains a subset of the data included in an Analysis Services database. Analysis Services uses partitions to manage and store data and aggregations for a measure group in a cube. A measure group can have one or more partitions, and, when large data volumes are concerned, partitions afford us a means of scaling the Unified Dimension Model (UDM) we have designed to support our Analysis Services database.
When designing partitions, it is important to keep in mind that the whole objective is to support good query and processing performance for large Analysis Services measure groups. Partitioning can be vital when we anticipate large measure groups in our design environment, because the boosted query performance they support can make the difference in a mediocre solution and a solution that makes robust and satisfying analysis and reporting available to the information consumers of our clients and employers. The boost in performance is largely possible because the Analysis Server is capable of selecting and querying among the partitions that are available: that is, Analysis Services can select a single partition to meet the demands of a given query, and return results more rapidly because a (sometimes dramatically) smaller dataset is targeted and physically accessed.
We mentioned other ways partitioning can help us to manage large Analysis Services databases in other articles of this subseries, as well. Among these, we noted that small volumes of fact data (say that of a day or a week) can be added to a small portion of a cube (a partition) faster than we can incrementally add the same, small amount of data into a single, much larger partition that might contain years of history. Planning for the presence of a relatively small partition to contain current data can give us more options for supporting the delivery of real-time data, in many cases.
Recall that the primary reason for partitioning in the first place is to minimize the amount of time required to process a cube; partitioning makes this possible, as individual partitions can be processed independently of each other. Because we can process partitions independently, we can process our current data, which is subject to ongoing change, more frequently (say, to add small time frames incrementally), while we are afforded the option to process seldom changing historical data (say data belonging to years before the prior year) less frequently. Moreover, parallel processing of individual partitions can mean added efficiencies, for obvious reasons.
We will get some hands-on exposure to various partition settings in the practice session below. Before we get started discussing various partition planning criteria, and then taking a brief look at some sample resource assignments / settings involved with each, we will need to prepare the local environment for the practice session. We will take steps to accomplish this within the section that follows.