Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted Oct 15, 2009

Cube Storage: Introduction to Partitions

By William Pearson

This article introduces partitions in Analysis Services. Here, we will introduce partitions and discuss their characteristics and considerations surrounding their use, as well as their impact on data storage and cube processing.

Note: For more information about my MSSQL Server Analysis Services column in general, see the section entitled “About the MSSQL Server Analysis Services Series” that follows the conclusion of this article.

Introduction

In Dimensional Model Components: Dimensions Parts I and II, we undertook a general introduction to the dimensional model, noting its wide acceptance as the preferred structure for presenting quantitative and other organizational data to information consumers. As a part of our extended examination of dimensions, we discussed the primary objectives of business intelligence, including its capacity to support:

  • the presentation of relevant and accurate information representing business operations and events;
  • the rapid and accurate return of query results;
  • “slice and dice” query creation and modification;
  • an environment wherein information consumers can pose questions quickly and easily, and achieve rapid results datasets.

We noted in Cube Storage: Introduction that the second objective above, the capacity of business intelligence to support “the rapid and accurate return of query results”, translates to minimal querying time. We discussed that storage design plays a key role in enhancing query performance across our cubes, and, as we will learn in this article, partitions play a significant role in the way that Analysis Services manages and stores data and aggregations for a measure group in a cube.

In this article, we will continue the general exploration of cube storage that we began in Cube Storage: Introduction, this time focusing upon partitions. Our introduction to partitions here will lead to more detailed exploration of various concepts surrounding partitions in subsequent articles that examine partition planning, as well as hands-on sessions focused upon various tasks surrounding partitions, including:

  • the creation of local partitions within the Business Intelligence Development Studio;
  • the creation of multiple partitions for a single measure group, based upon views in the underlying relational database;
  • the creation of multiple partitions for a single measure group, based upon named queries in Analysis Services;
  • the creation of remote partitions within the Business Intelligence Development Studio;
  • the creation of partitions within SQL Server Management Studio;
  • filtering partitions;
  • merging partitions.

Introducing Partitions in Analysis Services

Because the data sources underlying our cubes, and even the cubes themselves, can become very large in physical size, storage becomes a significant consideration within cube design strategy. A partition is a physical file on a hard disk that contains a subset of the data included in an Analysis Services database. Analysis Services uses partitions to manage and store data and aggregations for a measure group in a cube.

Partitions make it possible for us to spread data over multiple hard disks, should data growth or other factors dictate the need and / or convenience for doing so. This spreading of data can include local partitions (which are stored locally on hard disk), remote partitions (which are distributed across multiple hard disks), or a combination of the two types. Partitions rely on storage settings to define the format and processing schedule for the database, and they use writeback settings to enable “what-if” analysis.

Local and Remote Partitions

For each measure group we create within a given cube, a single partition, containing all data and metadata within and about the measure group, is created to support that measure group. Although we begin cube creation with a single partition for each measure group, however, we begin to experience performance deterioration as a partition grows larger. We can typically reduce the time required to process and query the cube by dividing a single large partition into smaller partitions. We do this through the addition of explicitly created partitions to the existing partition – partitions across which the existing data can be spread. (When we create a new partition for a measure group, the new partition is added to the set of partitions that already exist for the measure group.)

A given measure group reflects the combined data that is contained in all its partitions. When establishing multiple partitions, therefore, we must ensure that the data for any partition in a measure group excludes the data for any other partition in the measure group, to ensure that data is not “double counted” in the measure group. The original partition, created when we create a given measure group, is based on a single fact table in the data source view of the cube. When multiple partitions support a measure group, each partition can reference a different table in either the data source view or in the underlying relational data source for the cube. While more than one partition in a measure group can reference the same table, each partition must be restricted, through filtering or another method, to different rows in the table to prevent the double counting we have mentioned.

NOTE: We explore filtering, and other means of restricting the data that is stored in a partition, in other articles of this subseries.

When we spread the data across multiple drives on the same server, we refer to the resulting partitions as local partitions. When we spread the data over multiple machines, we are establishing remote partitions.

The Benefits of Partitioning Measure Groups

The processing time required for large measure groups can be reduced when we partition those groups, because processing can then be undertaken in parallel across the partitions. (Parallel processing means faster execution, primarily because the processing of one partition does not have to finish before the processing of another can start; more than one processing job can run at the same time, typically utilizing processer capacity more efficiently.) And when we distribute the data over multiple machines with remote partitions, we not only provide more physical room for large volumes of data, but we make it possible for multiple computers to process the data in parallel.

It is easy to see how partitions afford us a powerful and flexible means of managing large cubes. For example, a cube that contains financial information can contain a partition for the data of each past year, together with partitions for each month of the current year. In general, only the current monthly partition would require processing when current information is added to the cube. Because we would be processing a significantly smaller amount of data, processing performance would be enhanced, perhaps dramatically, by the decreased time required. At the end of the year the twelve monthly partitions could be merged into a single partition for the year to which they belong, and a new partition could be created for the first month of the new year. (We gain hands-on exposure to merging partitions in an independent article of this subseries.) Moreover, this new partition creation process could be automated as part of our data warehouse loading and cube processing procedures.

Although partitions are not visible to business users of a cube, administrators can easily configure, add, or drop partitions. Each partition is physically stored in a separate set of files. The aggregate data of each partition can be stored on the instance of Analysis Services where the partition is defined, on another instance of Analysis Services, or in the data source that is used to supply the partition's source data. As we have noted, partitions allow the source data and aggregate data of a cube to be distributed across multiple hard drives and among multiple server computers. For a cube of moderate to large size, partitions can greatly improve query performance, load performance, and ease of cube maintenance.

The storage mode of each partition can be configured independently of other partitions in the measure group. Partitions can be stored by using various combinations of options for source data location, storage mode, proactive caching, and aggregation design. Options for real-time OLAP and proactive caching allow us to balance query speed against latency when we design a partition. Storage options can also be applied to related dimensions and to facts in a measure group. This flexibility enables us to design cube storage strategies appropriate to the needs of our environments.

NOTE: For more information on storage modes, see Cube Storage: Introduction, the initial article of this subseries of my monthly Introduction to MSSQL Server Analysis Services series here at Database Journal.

The Content and Structure of Partitions

Partitions are physical “containers” housing a subset of the data of a measure group. Partitions are not visible to MDX queries, and are not apparent in cube browsers or reporting applications. Regardless of the number of partitions that are defined for a given measure group, these tools reflect the whole content of the measure group.

A simple partition is composed of:

  • Basic Information – including the partition name, its storage mode, the processing mode, and other information.
  • Slicing Definition – an MDX expression, specifying a tuple or a set, which has identical restrictions to the StrToSet() MDX function – that is, together with the CONSTRAINED parameter, the slicing definition can employ dimension, hierarchy, level and member names, keys, unique names, or other named objects in the cube, but cannot use MDX functions.
  • Aggregation Design - a collection of aggregation definitions that can be shared across multiple partitions. (The default is taken from the parent cube's aggregation design).

The structure of a given partition must match the structure of the measure group that it supports, which means that the measures that define the measure group must also be defined in the partition, along with all related dimensions. It is for this reason that, when a partition is created, it automatically inherits the same set of measures and related dimensions that are defined for the measure group (whose creation triggers the partition’s creation).

Each partition in a measure group can have a different fact table, and these fact tables can exist within different data sources. When different partitions in a measure group have different fact tables, the tables must be sufficiently similar to maintain the structure of the measure group (which means that the processing query returns the same columns and data types for all fact tables for all partitions). When fact tables for different partitions are from different data sources, the source tables for any related dimensions, and also any intermediate fact tables, must also be present in all data sources and must have the same structure in all the databases. Also, all dimension table columns that are used to define attributes for cube dimensions related to the measure group must be present in all of the data sources. There is no need to define all the joins between the source table of a partition and a related dimension table if the partition source table has the identical structure as the source table for the measure group.

Columns that are not used to define measures in the measure group can be present in some fact tables but absent in others. Similarly, columns that are not used to define attributes in related dimension tables can be present in some databases but absent in others. Tables that are not used for either fact tables or related dimension tables can be present in some databases but absent in others.

Data Sources and Partition Storage

A partition is based either on a table or view in a data source, or on a table or named query in a data source view. (We examine the setup of each within independent articles of this subseries.) The location where partition data is stored is defined by the data source binding. Typically, we can partition a measure group horizontally or vertically:

  • In a horizontally partitioned measure group, each partition in a measure group is based on a separate table. This approach to partitioning is appropriate when data is separated into multiple tables. As an illustration, some relational databases have a separate table for each month's data.
  • In a vertically partitioned measure group, a measure group is based on a single table, and each partition is based on a source system query that filters the data for the partition. For example, if a single table contains several months’ data, the measure group could still be partitioned by month by applying a Transact-SQL WHERE clause that returns a separate month's data for each partition.

As we mentioned earlier, each partition has storage settings that determine whether the data and aggregations for the partition are stored in the local instance of Analysis Services or in a remote partition using another instance of Analysis Services. The storage settings can also specify the storage mode and whether proactive caching is used to control latency for a partition.

Precautions with Partitions

Anytime we create and manage multiple-partition measure groups, we have to take precautions to guarantee that cube data is accurate. Although these precautions do not usually apply to single-partition measure groups, they do apply when we incrementally update partitions. It is important to understand that, when we incrementally update a partition, a new temporary partition is created that has a structure identical to that of the source partition – this partition contains the incremental, or “delta,” data. The temporary partition is processed and then merged with the source partition. Therefore, we must ensure that the processing query that populates the temporary partition does not duplicate any data already present in an existing partition. (We explore the concepts surrounding, and get some hands-on exposure to performing, both filtering and partition merging in other articles of this subseries.)

We will examine many of the properties, and the associated settings, that we use in creating and maintaining partitions in Analysis Services in subsequent articles of this monthly column, where we will gain hands-on exposure to these in a working environment.

Conclusion

In this article, we continued the general exploration of cube storage that we began in Cube Storage: Introduction, this time focusing upon partitions. Our introduction to partitions is intended to serve as a lead-in to more detailed exploration of various concepts surrounding partitions in subsequent, independent articles that examine partition planning, as well as hands-on sessions focused upon various tasks surrounding partitions.

We explored the concepts of local and remote partitions, and then discussed the benefits we can expect to accrue when we partition the measure groups of our cubes. We next focused upon the content and structure of partitions, and then examined considerations surrounding data sources and partition storage. Finally, we touched upon precautions that we need to keep in mind when we create and manage multiple-partition measure groups, especially when we perform incremental updates.

Throughout our introduction to partitions in Analysis Services, we looked forward to subsequent partition–related articles, where we will gain hands-on exposure to various tasks involved in the creation and maintenance of partitions, including:

  • the creation of local partitions within the Business Intelligence Development Studio;
  • the creation of multiple partitions for a single measure group, based upon views in the underlying relational database;
  • the creation of multiple partitions for a single measure group, based upon named queries in Analysis Services;
  • the creation of remote partitions within the Business Intelligence Development Studio;
  • the creation of partitions within SQL Server Management Studio;
  • filtering partitions; and
  • merging partitions.

About the Series ...

This article is a member of the series Introduction to MSSQL Server Analysis Services. The monthly column is designed to provide hands-on application of the fundamentals of MS SQL Server Analysis Services (“Analysis Services”), with each installment progressively presenting features and techniques designed to meet specific real-world needs. For more information on the series, please see my initial article, Creating Our First Cube.

» See All Articles by Columnist William E. Pearson, III

Introduction to MSSQL Server Analysis Services Series
Introduction to Security in Analysis Services
Cube Storage: Planning Partitions from a SQL Server Management Studio Perspective
Cube Storage: Planning Partitions (Business Intelligence Development Studio Perspective)
Cube Storage: Introduction to Partitions
Introduction to Cube Storage
Attribute Discretization: Customize Grouping Names
Attribute Discretization: Using the "Clusters" Method
Attribute Discretization: Using the "Equal Areas" Method
Attribute Discretization: Using the Automatic Method
Introduction to Attribute Discretization
More Exposure to Settings and Properties in Analysis Services Attribute Relationships
Attribute Relationships: Settings and Properties
Introduction to Attribute Relationships in MSSQL Server Analysis Services
Attribute Member Values in Analysis Services
MSSQL Analysis Services - Attribute Member Names
Attribute Member Keys - Pt II: Composite Keys
Attribute Member Keys - Pt 1: Introduction and Simple Keys
Dimension Attributes: Introduction and Overview, Part V
Dimension Attributes: Introduction and Overview, Part IV
Dimension Attributes: Introduction and Overview, Part III
Dimension Attributes: Introduction and Overview, Part II
Dimension Attributes: Introduction and Overview, Part I
Dimensional Model Components: Dimensions Part II
Dimensional Model Components: Dimensions Part I
Manage Unknown Members in Analysis Services 2005, Part II
Manage Unknown Members in Analysis Services 2005, Part I
Alternatively Sorting Attribute Members in Analysis Services 2005
Introduction to Linked Objects in Analysis Services 2005
Distinct Counts in Analysis Services 2005
Positing the Intelligence: Conditional Formatting in the Analysis Services Layer
Administration and Optimization: SQL Server Profiler for Analysis Services Queries
Mastering Enterprise BI: Time Intelligence Pt. II
Mastering Enterprise BI: Time Intelligence Pt. I
Design and Documentation: Introducing the Visio 2007 PivotDiagram
Actions in Analysis Services 2005: The URL Action
Actions in Analysis Services 2005: The Drillthrough Action
Mastering Enterprise BI: Introducing Actions in Analysis Services 2005
Mastering Enterprise BI: Introduction to Translations
Mastering Enterprise BI: Introduction to Perspectives
Introduction to the Analysis Services 2005 Query Log
Mastering Enterprise BI: Working with Measure Groups
Mastering Enterprise BI: Introduction to Key Performance Indicators
Mastering Enterprise BI: Extend the Data Source with Named Calculations, Pt. II
Mastering Enterprise BI: Extend the Data Source with Named Calculations, Pt. I
Process Analysis Services Objects with Integration Services
Usage-Based Optimization in Analysis Services 2005
Introduction to MSSQL Server Analysis Services: Named Sets Revisited
Introduction to MSSQL Server Analysis Services: Migrating an Analysis Services 2000 Database to Analysis Services 2005
Introduction to MSSQL Server Analysis Services: Introducing Data Source Views
Introduction to MSSQL Server Analysis Services: Reporting Options for Analysis Services Cubes: MS Excel 2003 and More ...
Introduction to MSSQL Server Analysis Services: Mastering Enterprise BI: Create Aging "Buckets" in a Cube
Introduction to MSSQL Server Analysis Services: Mastering Enterprise BI: Relative Time Periods in an Analysis Services Cube, Part II
Introduction to MSSQL Server Analysis Services: Mastering Enterprise BI: Relative Time Periods in an Analysis Services Cube
Introduction to MSSQL Server Analysis Services: Process Analysis Services Cubes with DTS
Introduction to MSSQL Server Analysis Services: Presentation Nuances: CrossTab View - Same Dimension
Introduction to MSSQL Server Analysis Services: Point-and-Click Cube Schema Simplification
Introduction to MSSQL Server 2000 Analysis Services: Manage Distinct Count with a Virtual Cube
Introduction to MSSQL Server 2000 Analysis Services: Distinct Count Basics: Two Perspectives
Introduction to MSSQL Server 2000 Analysis Services: Semi-Additive Measures and Periodic Balances
Introduction to MSSQL Server 2000 Analysis Services: Performing Incremental Cube Updates - An Introduction
Introduction to MSSQL Server 2000 Analysis Services: Partitioning a Cube in Analysis Services - An Introduction
Introduction to MSSQL Server 2000 Analysis Services: Basic Storage Design
Introduction to MSSQL Server 2000 Analysis Services: Derived Measures vs. Calculated Measures
Introduction to MSSQL Server 2000 Analysis Services: Creating a Dynamic Default Member
Introduction to MSSQL Server 2000 Analysis Services: Another Approach to Local Cube Design and Creation
Introduction to MSSQL Server 2000 Analysis Services: Introduction to Local Cubes
Introduction to MSSQL Server 2000 Analysis Services: Actions in Virtual Cubes
Introduction to MSSQL Server 2000 Analysis Services: Putting Actions to Work in Regular Cubes
Introduction to MSSQL Server 2000 Analysis Services: Reporting Options for Analysis Services Cubes: ProClarity Part II
Introduction to MSSQL Server 2000 Analysis Services: Reporting Options for Analysis Services Cubes: ProClarity Professional, Part I
Introduction to MSSQL Server 2000 Analysis Services: Using Calculated Cells in Analysis Services , Part II
Introduction to MSSQL Server 2000 Analysis Services: Using Calculated Cells in Analysis Services, Part I
Introduction to MSSQL Server 2000 Analysis Services: MSAS Administration and Optimization: Toward More Sophisticated Analysis
Introduction to MSSQL Server 2000 Analysis Services: MSAS Administration and Optimization: Simple Cube Usage Analysis
Introduction to MSSQL Server 2000 Analysis Services: Build a Web Site Traffic Analysis Cube: Part II
Build a Web Site Traffic Analysis Cube: Part I
Reporting Options for Analysis Services Cubes: Cognos PowerPlay
Reporting Options for Analysis Services Cubes: MS FrontPage 2002
Reporting Options for Analysis Services Cubes: MS Excel 2002
Introduction to MSSQL Server 2000 Analysis Services: Drilling Through to Details: From Two Perspectives
Introduction to MSSQL Server 2000 Analysis Services: Custom Cubes: Financial Reporting - Part II
Introduction to MSSQL Server 2000 Analysis Services Custom Cubes: Financial Reporting (Part I)
Introduction to SQL Server 2000 Analysis Services: Exploring Virtual Cubes
Introduction to SQL Server 2000 Analysis Services: Working with the Cube Editor
Introduction to SQL Server 2000 Analysis Services: Parent-Child Dimensions
Introduction to SQL Server 2000 Analysis Services: Handling Time Dimensions
Introduction to SQL Server 2000 Analysis Services: Working with Dimensions
Introduction to SQL Server 2000 Analysis Services: Creating Our First Cube



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM