A Practical Guide to Data Warehousing in Oracle - Part 3

February 25, 2004

Introduction

One feature that is almost ubiquitous in Oracle data warehousing is the use of the Partitioning Option. Over the past few major releases the feature set associated with partitioning has increased to cover four different partitioning types and improvements in the manageability of partitioned objects.

While this is to be welcomed, don't forget the caveat against grabbing every shiny feature of the software, that I gave in Part 1 of this series. Therefore, at the same time that you are drooling over Metalink Note 125314.1 "Boosting Performance by Hash and Composite Partitions," you should also be reading Metalink Note 165461.1 "Top Partitioned Indexes Bugs." Do not assume that every feature is going to work flawlessly - if you plan to use Partition Change Tracking for your materialized view refresh process, then search Metalink for known issues and test the refresh process several times on your production data volumes on the same operating system and Oracle versions and patch-levels that you will be using on production. Also, have a plan ready in case the process fails.

Pretty pessimistic stuff, but there is nothing like attempting a PCT-based fast refresh and finding your materialized views mysteriously empty for promoting the use of contingency plans. Ask me how I know.

Partitioning Benefits

There are three major advantages of partitioning.

  • Partition Pruning - Oracle only accesses a limited set of table partitions if the FROM and WHERE clause permit it to.
  • Partition-wise Joins - Where two tables that have compatible partitioning schemes are joined , Oracle improves the efficiency of parallel operations by performing the join between individual partitions of the tables.
  • Manageability - Partitioning allows DDL operations on a large subset of table rows with some element of commonality defined through the partitioning type.

Partitioning Types

There are four types of partitioning available in Oracle 9i. The following descriptions are quoted from the Data Warehousing Guide.

  • Range partitioning maps data to partitions based on ranges of partition key values that you establish for each partition. It is the most common type of partitioning and is often used with dates. For example, you might want to partition sales data into monthly partitions
  • Hash partitioning maps data to partitions based on a hashing algorithm that Oracle applies to a partitioning key that you identify. The hashing algorithm evenly distributes rows among partitions, giving partitions approximately the same size.
  • List partitioning enables you to control explicitly how rows map to partitions. You do this by specifying a list of discrete values for the partitioning column in the description for each partition.
  • Composite partitioning combines range and hash or list partitioning. Oracle first distributes data into partitions according to boundaries established by the partition ranges. Then, for range-hash partitioning, Oracle uses a hashing algorithm to further divide the data into subpartitions within each range partition. For range-list partitioning, Oracle divides the data into subpartitions within each range partition based on the explicit list you chose.

The documentation also gives many excellent examples of these methodologies.

Using Partitioning

One of the most common ways of leveraging range partitioning techniques is to partition fact tables by date range, where each date range encompasses one load cycle for the table. The data is loaded into a separate table having exactly the same column definitions, constraints and indexes as the fact table. The constraints are enabled (sometimes), the indexes rebuilt (they will have been unusable for the load process to avoid overhead in maintaining and logging the indexes), and a "partition exchange" operation is executed to swap the data and index segments of the table with the data and index segments of a fact table partition. This DDL operation is extremely fast, and just as importantly, is as fast to reverse.

Hence, it seems that any data warehouse holding historical data, which is to say virtually all of them, is incomplete without range partitioning.

Hash partitioning allows even distribution of data volume between multiple partitions. As each partition can be assigned its own tablespace, this can be translated into an even distribution of data rows over devices. A good choice of a hash partitioning key is a column that has a great many unique values and is commonly used as a predicate in a user query. Aim to use a number of partitions with an integer power of two - ie. 2 partitions, or 4, or 8, or 16, or 32 etc., or you run the risk of unbalancing the number of rows per partition. It is not the end of the world if you use 30 partitions though, just watch for the imbalance effect.

If you feel an urge to hash a partition by a column with very few unique values, you are probably in need of list partitioning. Make sure that you know and plan for all the potential values of the partitioning key though - you can define a partition to hold unlisted values, but again you may face imbalance in the number of rows per partition.

By combining range and hash partitioning into one type of composite partitioning you can get the advantages of both - compatibility with the efficient loading and deleting of time-based data through the range partitioning, and load balancing through hash partitioning within each range partition.

One potential disadvantage of this composite methodology is that it can easily lead to management of a very large number of partitions. This can be problematic for queries that do not have predicates that allow partition pruning, as they can require that every partition of multiple indexes be touched by the query. One resolution for this is to merge successive range partitions together - day-level partitions can be merged in a series of DDL statements to form a single month-level partition.

Index Partitioning

On a partitioned data warehouse fact table, I can think of no good reason to have unpartitioned indexes or to have indexes partitioned in any way other than exactly the same as the table, so that's all we really need to know about them.

References

See The Oracle Data Warehousing Guide for more detailed information.

See The SQL Reference for a list of partition-based operations.

» See All Articles by Columnist Dave Aldridge








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers