A Practical Guide to Data Warehousing in Oracle - Part 3
February 25, 2004
One feature that is almost ubiquitous in Oracle data warehousing is the use of the Partitioning Option. Over the past few major releases the feature set associated with partitioning has increased to cover four different partitioning types and improvements in the manageability of partitioned objects.
While this is to be welcomed, don't forget the caveat against grabbing every shiny feature of the software, that I gave in Part 1 of this series. Therefore, at the same time that you are drooling over Metalink Note 125314.1 "Boosting Performance by Hash and Composite Partitions," you should also be reading Metalink Note 165461.1 "Top Partitioned Indexes Bugs." Do not assume that every feature is going to work flawlessly - if you plan to use Partition Change Tracking for your materialized view refresh process, then search Metalink for known issues and test the refresh process several times on your production data volumes on the same operating system and Oracle versions and patch-levels that you will be using on production. Also, have a plan ready in case the process fails.
Pretty pessimistic stuff, but there is nothing like attempting a PCT-based fast refresh and finding your materialized views mysteriously empty for promoting the use of contingency plans. Ask me how I know.
There are three major advantages of partitioning.
There are four types of partitioning available in Oracle 9i. The following descriptions are quoted from the Data Warehousing Guide.
The documentation also gives many excellent examples of these methodologies.
One of the most common ways of leveraging range partitioning techniques is to partition fact tables by date range, where each date range encompasses one load cycle for the table. The data is loaded into a separate table having exactly the same column definitions, constraints and indexes as the fact table. The constraints are enabled (sometimes), the indexes rebuilt (they will have been unusable for the load process to avoid overhead in maintaining and logging the indexes), and a "partition exchange" operation is executed to swap the data and index segments of the table with the data and index segments of a fact table partition. This DDL operation is extremely fast, and just as importantly, is as fast to reverse.
Hence, it seems that any data warehouse holding historical data, which is to say virtually all of them, is incomplete without range partitioning.
Hash partitioning allows even distribution of data volume between multiple partitions. As each partition can be assigned its own tablespace, this can be translated into an even distribution of data rows over devices. A good choice of a hash partitioning key is a column that has a great many unique values and is commonly used as a predicate in a user query. Aim to use a number of partitions with an integer power of two - ie. 2 partitions, or 4, or 8, or 16, or 32 etc., or you run the risk of unbalancing the number of rows per partition. It is not the end of the world if you use 30 partitions though, just watch for the imbalance effect.
If you feel an urge to hash a partition by a column with very few unique values, you are probably in need of list partitioning. Make sure that you know and plan for all the potential values of the partitioning key though - you can define a partition to hold unlisted values, but again you may face imbalance in the number of rows per partition.
By combining range and hash partitioning into one type of composite partitioning you can get the advantages of both - compatibility with the efficient loading and deleting of time-based data through the range partitioning, and load balancing through hash partitioning within each range partition.
One potential disadvantage of this composite methodology is that it can easily lead to management of a very large number of partitions. This can be problematic for queries that do not have predicates that allow partition pruning, as they can require that every partition of multiple indexes be touched by the query. One resolution for this is to merge successive range partitions together - day-level partitions can be merged in a series of DDL statements to form a single month-level partition.
On a partitioned data warehouse fact table, I can think of no good reason to have unpartitioned indexes or to have indexes partitioned in any way other than exactly the same as the table, so that's all we really need to know about them.
See The Oracle Data Warehousing Guide for more detailed information.
See The SQL Reference for a list of partition-based operations.