Introduction
One feature that is almost ubiquitous in Oracle data warehousing is the use of the Partitioning Option. Over the past few major releases the feature set associated with partitioning has increased to cover four different partitioning types and improvements in the manageability of partitioned objects.
While this is to be welcomed, don’t
forget the caveat against grabbing every shiny feature of the software, that I
gave in Part
1 of this series. Therefore, at the same time that you are drooling over
Metalink Note 125314.1 "Boosting Performance by Hash and Composite
Partitions," you should also be reading Metalink Note 165461.1 "Top
Partitioned Indexes Bugs." Do not assume that every feature is going
to work flawlessly – if you plan to use Partition Change Tracking for your
materialized view refresh process, then search Metalink for known issues and
test the refresh process several times on your production data volumes on the
same operating system and Oracle versions and patch-levels that you will be
using on production. Also, have a plan ready in case the process fails.
Pretty pessimistic stuff, but there
is nothing like attempting a PCT-based fast refresh and finding your
materialized views mysteriously empty for promoting the use of contingency
plans. Ask me how I know.
Partitioning Benefits
There are three major advantages
of partitioning.
- Partition
Pruning – Oracle only accesses a limited set of table partitions if the FROM
and WHERE clause permit it to. - Partition-wise
Joins – Where two tables that have compatible partitioning schemes are joined ,
Oracle improves the efficiency of parallel operations by performing the join
between individual partitions of the tables. - Manageability
– Partitioning allows DDL operations on a large subset of table rows with some
element of commonality defined through the partitioning type.
Partitioning Types
There are four types of
partitioning available in Oracle 9i. The following descriptions are quoted from
the Data Warehousing Guide.
- Range
partitioning maps data to partitions based on ranges of partition key values
that you establish for each partition. It is the most common type of
partitioning and is often used with dates. For example, you might want to
partition sales data into monthly partitions - Hash
partitioning maps data to partitions based on a hashing algorithm that Oracle
applies to a partitioning key that you identify. The hashing algorithm evenly
distributes rows among partitions, giving partitions approximately the same
size. - List
partitioning enables you to control explicitly how rows map to partitions. You
do this by specifying a list of discrete values for the partitioning column in
the description for each partition. - Composite
partitioning combines range and hash or list partitioning. Oracle first
distributes data into partitions according to boundaries established by the
partition ranges. Then, for range-hash partitioning, Oracle uses a hashing
algorithm to further divide the data into subpartitions within each range
partition. For range-list partitioning, Oracle divides the data into subpartitions
within each range partition based on the explicit list you chose.
The documentation also gives many
excellent examples of these methodologies.
Using Partitioning
One of the most common ways of
leveraging range partitioning techniques is to partition fact tables by date
range, where each date range encompasses one load cycle for the table. The data
is loaded into a separate table having exactly the same column definitions, constraints
and indexes as the fact table. The constraints are enabled (sometimes), the
indexes rebuilt (they will have been unusable for the load process to avoid
overhead in maintaining and logging the indexes), and a "partition
exchange" operation is executed to swap the data and index segments of the
table with the data and index segments of a fact table partition. This DDL
operation is extremely fast, and just as importantly, is as fast to reverse.
Hence, it seems that any data
warehouse holding historical data, which is to say virtually all of them, is
incomplete without range partitioning.
Hash partitioning allows even
distribution of data volume between multiple partitions. As each partition can
be assigned its own tablespace, this can be translated into an even
distribution of data rows over devices. A good choice of a hash partitioning
key is a column that has a great many unique values and is commonly used as a
predicate in a user query. Aim to use a number of partitions with an integer
power of two – ie. 2 partitions, or 4, or 8, or 16, or 32 etc., or you run the
risk of unbalancing the number of rows per partition. It is not the end of the
world if you use 30 partitions though, just watch for the imbalance effect.
If you feel an urge to hash a partition
by a column with very few unique values, you are probably in need of list
partitioning. Make sure that you know and plan for all the potential values of
the partitioning key though – you can define a partition to hold unlisted
values, but again you may face imbalance in the number of rows per partition.
By combining range and hash
partitioning into one type of composite partitioning you can get the advantages
of both – compatibility with the efficient loading and deleting of time-based
data through the range partitioning, and load balancing through hash
partitioning within each range partition.
One potential disadvantage of
this composite methodology is that it can easily lead to management of a very
large number of partitions. This can be problematic for queries that do not
have predicates that allow partition pruning, as they can require that every
partition of multiple indexes be touched by the query. One resolution for this
is to merge successive range partitions together – day-level partitions can be
merged in a series of DDL statements to form a single month-level partition.
Index Partitioning
On a partitioned data warehouse
fact table, I can think of no good reason to have unpartitioned indexes or to
have indexes partitioned in any way other than exactly the same as the table, so
that’s all we really need to know about them.
References
See The Oracle Data
Warehousing Guide for more detailed information.
See The SQL Reference for
a list of partition-based operations.