Introduction to Cube Storage

September 17, 2009

This article introduces storage in Analysis Services. Here, we will discuss various cube design concepts and considerations, including partitions and aggregations, focusing upon their impact on data storage and, to some extent, cube processing.

Note: For more information about my MSSQL Server Analysis Services column in general, see the section entitled “About the MSSQL Server Analysis Services Series” that follows the conclusion of this article.

Introduction

In Dimensional Model Components: Dimensions Parts I and II, we undertook a general introduction to the dimensional model, noting its wide acceptance as the preferred structure for presenting quantitative and other organizational data to information consumers. As a part of our extended examination of dimensions, we discussed the primary objectives of business intelligence, including its capacity to support:

  • the presentation of relevant and accurate information representing business operations and events;
  • the rapid and accurate return of query results;
  • “slice and dice” query creation and modification;
  • an environment wherein information consumers can pose questions quickly and easily, and achieve rapid results datasets.

The second objective above, the capacity of business intelligence to support “the rapid and accurate return of query results”, translates to minimal querying time, of course. Storage design plays a key role in enhancing query performance across our cubes, as we shall see in the sections that follow.

In this article, we will introduce cube storage in general, and kick off an extended examination of the topic in a subsequent, multiple-article subseries. Within these articles, we will learn about Analysis Services storage from several perspectives, including:

  • An introduction to Storage Mode concepts;
  • An introduction to Partitions;
  • Partition Planning;
  • Hands-on sessions focused upon:
    • Creating Local Partitions within the Business Intelligence Development Studio;
    • Creating Remote Partitions within the Business Intelligence Development Studio;
    • Creating Partitions within SQL Server Management Studio;
    • Filtering Partitions;
    • Merging Partitions.
  • An introduction to Aggregations;
  • A practical examination of Aggregation Design and Application.

Introducing Cube Storage in Analysis Services

A classic quandary that faces any OLAP designer lies between two often-conflicting needs: the need to minimize cube processing time, and the need to minimize the time required to process queries against the cube. Moreover, and at the heart of reaching an optimal overall state, is the minimization of storage space, which contributes to acceptable cube processing times, and to the support of rapid queries. As we will learn in this article, as well as the other articles of this subseries, Analysis Services supports our creation of an optimal storage design by providing storage mode options among which we can pick to meet the needs of our environments. Analysis Services also provides us with a means of creating aggregations within our cubes to support the rapid return of data at the levels and juxtapositions that our businesses require. Finally, Analysis Services allows us, as administrators, to modify the storage design of our cubes after the cube has been deployed and is in production.

Storage Modes in Analysis Services

In Analysis Services, the term “storage” may be used to narrowly refer to cube metadata only, or it may include all of the source data from the fact table, as well as the aggregations defined by dimensions related to the measure group. The amount of data stored depends, therefore, upon the storage mode we select and the number of aggregations we define. Because the amount of data stored directly affects query performance, Analysis Services exposes / employs several techniques for minimizing the space required for storage of cube data and aggregations, including:

  • Storage options support our selection of the storage modes and locations that are most appropriate for our cube data.
  • A sophisticated algorithm supports the design of efficient summary aggregations to minimize storage without sacrificing speed.
  • Storage is not allocated for empty cells.

As we shall see in more detail, later in this subseries, storage is defined on a partition-by-partition basis, and at least one partition exists for each measure group in a cube.

Analysis Services provides a flexible and reliable architecture within which storage is an important consideration. As developers and administrators, we are afforded choices in the selection of storage mode – choices that allow us to make the most of the trade-off between minimal cube storage space (and, thus, minimal cube processing time, as we mentioned earlier) and minimal processing time for the queries we execute against the cube (improved, in many cases, when we increase the number of cube aggregations, and therefore the size of the cube). Since even before the advent of Analysis Services 2005 (metadata information began to be stored as XML in Analysis Services 2005), we have been afforded choices (which, in the general sense, remain the same) in how we store data and its aggregations. Simply stated, we can choose between Analysis Services (where the most efficient data retrieval and calculation capability is a given), the relational database, or even “somewhere in between” as the location of data / data aggregation storage. Our choice of storage mode affects the query and processing performance, storage requirements, and storage locations of each partition, as well as its parent measure group and cube. The choice of storage mode also affects processing choices, as we shall see.

Depending upon these choices, our storage options – for measure groups and dimensions - are, respectively, Multidimensional OLAP (MOLAP), Relational OLAP (ROLAP) or Hybrid OLAP (HOLAP). (Analysis Services also supports proactive caching, which enables us to combine the characteristics of ROLAP and MOLAP storage for both immediacy of data and query performance. We discuss proactive caching in other articles of this series.) As we shall see, when we are choosing the storage mode for a given cube that we are designing or maintaining, we are, in essence, dictating the placement of the granular data and the data aggregations involved with that cube.

Let’s take a look at each of the general storage modes in detail.

MOLAP

In the MOLAP storage mode, both granular data (to which we refer, at times, as the “details” or “detail data”) and aggregated data (also known as “balances” or “summaries”) are stored within the Analysis Services instance, in a proprietary format that is designed for rapid data retrieval and the efficient performance of – sometimes complex - calculations). The MOLAP storage mode is the default in Analysis Services, and offers the best query performance of the three general storage types. This enhanced performance is due primarily to the highly efficient architectural design of the indexed multidimensional data store (which is located on the Analysis Server) that underlies it. Additionally, query response times can be decreased substantially by using aggregations within a MOLAP data store in Analysis Services, as we have mentioned. Moreover, because the formatted data is housed within the Analysis Server, it does not need, in a query retrieval action, to be moved to the server from another physical location, as would be the case when the data has to be retrieved, as an example, across a network, from a relational data store.

Some disadvantages accompany the choice of the MOLAP storage mode. These include an increase in the demand for overall system storage, as the detail data, wherever it is housed, is duplicated within the cube. Moreover, the additional detail data increases cube processing time (although it means decreased query processing time, as access to the relational data source is not required, once cube processing has been accomplished). Finally, because the data within a cube is isolated from the underlying relational data, changes in the relational data are not reflected within the MOLAP data store until the cube is reprocessed, which can obviously mean that the two data stores can become out of sync. The time between cube processing cycles creates a “latency” period during which data in OLAP objects may not match the source data, so that the data in the cube is only as current as its most recent processing event.

The latency consideration is mitigated, somewhat, by the ease with which Analysis Services allows us to incrementally or fully update objects in MOLAP storage without taking the involved partition or cube offline. Although there are situations that may require us to take a cube offline to process certain structural changes to OLAP objects, we can minimize the downtime required to update MOLAP storage by implementing a “promotion” process, whereby we update and process cubes on a staging server, and then use database synchronization to copy the processed objects to the production server. We can also use proactive caching to minimize latency and maximize availability while retaining much of the performance advantage of MOLAP storage. (For more information about Proactive Caching, see other articles within my Introduction to MSSQL Server Analysis Services series.)

ROLAP

In the ROLAP storage mode, the key word is “relational:” the detail data, along with aggregated / summary data (stored within indexed views designed for that purpose), is housed within a relational database that is specified within Analysis Services. Unlike the MOLAP storage mode, ROLAP does not cause a duplicate of the source data to be stored in the Analysis Services data folders. Instead, when results cannot be derived from the query cache, the indexed views in the data source are accessed to answer queries.

Queries to retrieve data against Analysis Services are translated into one or more queries that are then redirected against the underlying relational database, to return the specified data that resides there. This is why the ROLAP storage mode offers the worst performance with regard to query processing. Cube processing is also typically slower with ROLAP.

Among the few post-Analysis Services 2000 scenarios where the ROLAP storage option might make sense are situations where we have one or more very large dimensions whose millions of members experience perpetual changes. One of the few advantages offered by ROLAP – an advantage that is abated to a large extent in this era of inexpensive storage - is that it allows us an ability to handle cubes whose size is limited only by the underlying relational database. Moreover, ROLAP enables users to view data in real time and can save storage space when in instances where we are working with large datasets that are infrequently queried, such as purely historical data and so forth. Because of its decreasing utility, it would seem that the ROLAP option is destined for relative obscurity in coming years.

HOLAP

In the HOLAP storage mode, attributes of the MOLAP and ROLAP modes are combined. Like MOLAP, HOLAP causes data aggregations to be stored in a multidimensional structure within an Analysis Services instance, but HOLAP does not cause a copy of the source data to be stored. For this reason, when queries access only summary data in aggregations, HOLAP is the equivalent of MOLAP.

In the HOLAP storage mode, detail data remains in the relational data source. Queries that access detail data - for example, if we want to drill down to an atomic detail underlying an aggregation in a cube – we must retrieve data from the relational database. Query processing will not be as fast as it would be if the detail data were stored in the MOLAP structure. With the HOLAP storage mode in place, information consumers will typically experience substantial differences in query retrieval times, depending upon whether the query can be resolved from cache or aggregations, versus from the relational source data itself.

Partitions stored as HOLAP are smaller than the equivalent MOLAP partitions because they do not contain source data. HOLAP partitions therefore respond faster than ROLAP partitions for queries requesting summary data. HOLAP storage mode is generally suited for partitions in cubes that require rapid query response for summaries based on a large amount of source data. However, where users generate queries that must touch leaf level data, such as for calculating median values, MOLAP is generally a better choice.

We will examine the properties, and the associated settings, that we use in designing and maintaining storage in Analysis Services in subsequent articles of this column, where we will gain hands-on exposure to these in a working environment.

Conclusion

In this article, we introduced cube storage in general, and kicked off a multiple-article subseries wherein we will perform an extended examination of the topic. We discussed the classic quandary that faces any OLAP designer: the need to minimize cube processing time, and the need to minimize the time required to process queries against the cube. We then touched upon the consideration at the heart of reaching an optimal overall state, the minimization of storage space, which results in acceptable cube processing times and the support of rapid queries.

We noted that Analysis Services supports our creation of an optimal storage design by providing storage mode options among which we can pick to meet the needs of our environments, and that Analysis Services allows us, as administrators, to modify the storage design of our cubes after the cube has been deployed, and is in production. We then introduced the Analysis Services storage modes in general - Multidimensional OLAP (MOLAP), Relational OLAP (ROLAP) and Hybrid OLAP (HOLAP) – and then explored each individually, citing general characteristics, as well as advantages and disadvantages associated with its use. Finally, we looked forward to subsequent storage–related articles, where we will learn about Analysis Services storage from several perspectives, including:

  • An introduction to Partitions;
  • Partition Planning;
  • Hands-on sessions focused upon:
    • Creating Local Partitions within the Business Intelligence Development Studio;
    • Creating Remote Partitions within the Business Intelligence Development Studio;
    • Creating Partitions within SQL Server Management Studio;
    • Filtering Partitions;
    • Merging Partitions.
  • An introduction to Aggregations;
  • A practical examination of Aggregation Design and Application.

About the Series ...

This article is a member of the series Introduction to MSSQL Server Analysis Services. The monthly column is designed to provide hands-on application of the fundamentals of MS SQL Server Analysis Services (“Analysis Services”), with each installment progressively presenting features and techniques designed to meet specific real-world needs. For more information on the series, please see my initial article, Creating Our First Cube.

» See All Articles by Columnist William E. Pearson, III

Introduction to MSSQL Server Analysis Services Series
Introduction to Security in Analysis Services
Cube Storage: Planning Partitions from a SQL Server Management Studio Perspective
Cube Storage: Planning Partitions (Business Intelligence Development Studio Perspective)
Cube Storage: Introduction to Partitions
Introduction to Cube Storage
Attribute Discretization: Customize Grouping Names
Attribute Discretization: Using the "Clusters" Method
Attribute Discretization: Using the "Equal Areas" Method
Attribute Discretization: Using the Automatic Method
Introduction to Attribute Discretization
More Exposure to Settings and Properties in Analysis Services Attribute Relationships
Attribute Relationships: Settings and Properties
Introduction to Attribute Relationships in MSSQL Server Analysis Services
Attribute Member Values in Analysis Services
MSSQL Analysis Services - Attribute Member Names
Attribute Member Keys - Pt II: Composite Keys
Attribute Member Keys - Pt 1: Introduction and Simple Keys
Dimension Attributes: Introduction and Overview, Part V
Dimension Attributes: Introduction and Overview, Part IV
Dimension Attributes: Introduction and Overview, Part III
Dimension Attributes: Introduction and Overview, Part II
Dimension Attributes: Introduction and Overview, Part I
Dimensional Model Components: Dimensions Part II
Dimensional Model Components: Dimensions Part I
Manage Unknown Members in Analysis Services 2005, Part II
Manage Unknown Members in Analysis Services 2005, Part I
Alternatively Sorting Attribute Members in Analysis Services 2005
Introduction to Linked Objects in Analysis Services 2005
Distinct Counts in Analysis Services 2005
Positing the Intelligence: Conditional Formatting in the Analysis Services Layer
Administration and Optimization: SQL Server Profiler for Analysis Services Queries
Mastering Enterprise BI: Time Intelligence Pt. II
Mastering Enterprise BI: Time Intelligence Pt. I
Design and Documentation: Introducing the Visio 2007 PivotDiagram
Actions in Analysis Services 2005: The URL Action
Actions in Analysis Services 2005: The Drillthrough Action
Mastering Enterprise BI: Introducing Actions in Analysis Services 2005
Mastering Enterprise BI: Introduction to Translations
Mastering Enterprise BI: Introduction to Perspectives
Introduction to the Analysis Services 2005 Query Log
Mastering Enterprise BI: Working with Measure Groups
Mastering Enterprise BI: Introduction to Key Performance Indicators
Mastering Enterprise BI: Extend the Data Source with Named Calculations, Pt. II
Mastering Enterprise BI: Extend the Data Source with Named Calculations, Pt. I
Process Analysis Services Objects with Integration Services
Usage-Based Optimization in Analysis Services 2005
Introduction to MSSQL Server Analysis Services: Named Sets Revisited
Introduction to MSSQL Server Analysis Services: Migrating an Analysis Services 2000 Database to Analysis Services 2005
Introduction to MSSQL Server Analysis Services: Introducing Data Source Views
Introduction to MSSQL Server Analysis Services: Reporting Options for Analysis Services Cubes: MS Excel 2003 and More ...
Introduction to MSSQL Server Analysis Services: Mastering Enterprise BI: Create Aging "Buckets" in a Cube
Introduction to MSSQL Server Analysis Services: Mastering Enterprise BI: Relative Time Periods in an Analysis Services Cube, Part II
Introduction to MSSQL Server Analysis Services: Mastering Enterprise BI: Relative Time Periods in an Analysis Services Cube
Introduction to MSSQL Server Analysis Services: Process Analysis Services Cubes with DTS
Introduction to MSSQL Server Analysis Services: Presentation Nuances: CrossTab View - Same Dimension
Introduction to MSSQL Server Analysis Services: Point-and-Click Cube Schema Simplification
Introduction to MSSQL Server 2000 Analysis Services: Manage Distinct Count with a Virtual Cube
Introduction to MSSQL Server 2000 Analysis Services: Distinct Count Basics: Two Perspectives
Introduction to MSSQL Server 2000 Analysis Services: Semi-Additive Measures and Periodic Balances
Introduction to MSSQL Server 2000 Analysis Services: Performing Incremental Cube Updates - An Introduction
Introduction to MSSQL Server 2000 Analysis Services: Partitioning a Cube in Analysis Services - An Introduction
Introduction to MSSQL Server 2000 Analysis Services: Basic Storage Design
Introduction to MSSQL Server 2000 Analysis Services: Derived Measures vs. Calculated Measures
Introduction to MSSQL Server 2000 Analysis Services: Creating a Dynamic Default Member
Introduction to MSSQL Server 2000 Analysis Services: Another Approach to Local Cube Design and Creation
Introduction to MSSQL Server 2000 Analysis Services: Introduction to Local Cubes
Introduction to MSSQL Server 2000 Analysis Services: Actions in Virtual Cubes
Introduction to MSSQL Server 2000 Analysis Services: Putting Actions to Work in Regular Cubes
Introduction to MSSQL Server 2000 Analysis Services: Reporting Options for Analysis Services Cubes: ProClarity Part II
Introduction to MSSQL Server 2000 Analysis Services: Reporting Options for Analysis Services Cubes: ProClarity Professional, Part I
Introduction to MSSQL Server 2000 Analysis Services: Using Calculated Cells in Analysis Services , Part II
Introduction to MSSQL Server 2000 Analysis Services: Using Calculated Cells in Analysis Services, Part I
Introduction to MSSQL Server 2000 Analysis Services: MSAS Administration and Optimization: Toward More Sophisticated Analysis
Introduction to MSSQL Server 2000 Analysis Services: MSAS Administration and Optimization: Simple Cube Usage Analysis
Introduction to MSSQL Server 2000 Analysis Services: Build a Web Site Traffic Analysis Cube: Part II
Build a Web Site Traffic Analysis Cube: Part I
Reporting Options for Analysis Services Cubes: Cognos PowerPlay
Reporting Options for Analysis Services Cubes: MS FrontPage 2002
Reporting Options for Analysis Services Cubes: MS Excel 2002
Introduction to MSSQL Server 2000 Analysis Services: Drilling Through to Details: From Two Perspectives
Introduction to MSSQL Server 2000 Analysis Services: Custom Cubes: Financial Reporting - Part II
Introduction to MSSQL Server 2000 Analysis Services Custom Cubes: Financial Reporting (Part I)
Introduction to SQL Server 2000 Analysis Services: Exploring Virtual Cubes
Introduction to SQL Server 2000 Analysis Services: Working with the Cube Editor
Introduction to SQL Server 2000 Analysis Services: Parent-Child Dimensions
Introduction to SQL Server 2000 Analysis Services: Handling Time Dimensions
Introduction to SQL Server 2000 Analysis Services: Working with Dimensions
Introduction to SQL Server 2000 Analysis Services: Creating Our First Cube








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers