A Practical Guide to Data Warehousing in Oracle, Part 6

Materialized Views, Part I

Materialized views have been available in Oracle for quite a
while, under their alter ego "Snapshots," and OLTP persons will be
familiar with them through their Replication abilities. In the world of data
warehousing they serve a rather different function, although in an abstract way
they are doing exactly the same thing, which is to make a set of information
available in multiple places.

It is important to start with a solid conceptual
understanding of what a materialized view (MV) actually is. There are three
elements to an MV…

  • The definition of a query.

  • The storage of that query’s result set.

  • A wide range of metadata that controls and helps to define the
    MV.

It is also important to grasp the limitations of MVs, which
vary widely by Oracle version. When you have understood the nature of MVs it is
but a short step to the incorrect assumption that a given MV ought to be
capable of a particular function. That function may very well be available in
the next major or minor release of Oracle, but in your version, it might
not. Indeed, it might be documented to be available in the next version, but in
your particular environment there might be a bug – previously known or of your
own discovery – that prevents that functionality from operating as expected. Or
at all.

Maybe the most important realization is that MVs are
enormously complex when compared to other features, such as bitmap indexes, and
in conformance with the previously outlined principle that "simple equals
good, and complex equal bad" that means that they must be approached with
some degree of caution.

Where We Can Use Them

In essence, there are three situations in which we might
want to make the same set of information available in multiple places.

The first of these is in summary tables. We provide
summary tables in a data warehouse because not every query is going to need the
very detailed information that our fact tables may provide, and because
allowing high‑level queries to scan a much smaller pre‑aggregated
data set reduces our i/o burden.

The second situation is in dimension tables. The
dimension tables contain lists of all the values in the key columns of the fact
tables, and maybe some more information on those values such as descriptive
text for codes.

The third situation is in the fact tables themselves.
I will admit this is a bit of a stretch for many people, but the fact tables
are generally the cleansed and transformed versions of some kind of source data
set. That data set may be a set of OLTP database tables, or a set of flat
files, but the fact tables are still in some ways a duplication of the original
data set.

Materialized Views for Summary Tables

A summary can be a very simple object to define. At their
most simple they provide a restricted list of the key columns of a fact table
and an aggregation – generally a SUM() — of some or all of the metric columns.
Therefore, a fact table with seven dimensional keys and six metrics can be
aggregated to a summary table of four dimensional keys plus two metrics. In the
process, the table becomes "shorter" as well as "narrower,"
and one million rows may be reduced to thirty thousand – an aggregation ratio
of about 33.

A more advanced form of summary table might include an
aggregation on one of the dimensional keys. A fact table with a dimensional key
at the "time" level (eg. "03‑Jan‑2004 10:51:22")
may be summarized as in the example above, but with the "time" key
aggregated to an "hour", "day", or "month" level,
thus increasing the aggregation ratio.

These two forms might even be combined, with an original "store"
key included in the summary with the addition of a higher "region"
key to allow faster region-based access to the table.

There are two main features of MVs that make them an attractive option for providing summary tables — query rewrite and fast refresh.

Query Rewrite for Summary Tables

Query rewrite allows the optimizer to redefine a SQL select
statement directed at a fact table so that it addresses a summary table
instead. It does this by comparing the relative costs of providing the result
from the original table with the costs of providing the result from one of many
available MVs, so needless to say, the cost-based optimizer (CBO) is a must. In
fact, the CBO is essential for nearly all data warehouse-specific features, so
if you are one of those rule-based optimizer fans you will just have to
suck-it-up and make the change.

Although there are many hoops to be jumped through in
configuring query rewrite, at the instance level as well as the schema level,
once it is up and running the functionality is robust, and unlikely to cause
sudden "feature-unexpectedly-stopped-working" surprises on a busy
Monday morning, providing that you maintain good statistics on the fact and
summary tables and partitions.

Fast Refresh for Summary Tables

Fast refresh is an essential mechanism for the maintenance
of MVs. It allows MVs to be kept current with respect to the master fact table
by making as few changes as possible to the MV, in contrast to the complete
refresh,
which would require that the entire master fact table be scanned
and re‑aggregated every time it is modified.

The heavily partition‑oriented methodology that Oracle
suggests for maintaining fact tables implies that MVs ought to be responsive to
change at the partition level of the fact table, and indeed Oracle offers
Partition Change Tracking (PCT), at various levels of capability and maturity
dependent upon version, to allow this.

PCT offers the ability to associate summary table rows with
the fact table partition from which they were sourced, for the purpose of both
fast refresh and query rewrite capabilities. Among other restrictions
(documented in the Oracle Data Warehousing Guide for your version), either the
partition key of the fact table or a column based on the Oracle-supplied DBMS_MVIEW.PMARKER must be included in the SELECT list for the MV.

As for query rewrite, there are some hoops to be jumped
through for an MV to be fast‑refreshable, and even more for it to be fast-refreshable
through PCT.

However …

It’s all very seductive stuff, this ability of your summary
tables to be self-maintaining – change the fact table, and they look after
themselves. However, there are some issues that make it not-quite-so clear cut,
and in my view, there are some disadvantages that need to be considered before
we leap headlong into this functionality.

In the next article, I will continue the topic of
Materialized Views with some of their disadvantages.

»


See All Articles by Columnist
Dave Aldridge

Latest Articles