A Practical Guide to Data Warehousing in Oracle, Part 6
May 18, 2004
Materialized Views, Part I
Materialized views have been available in Oracle for quite a while, under their alter ego "Snapshots," and OLTP persons will be familiar with them through their Replication abilities. In the world of data warehousing they serve a rather different function, although in an abstract way they are doing exactly the same thing, which is to make a set of information available in multiple places.
It is important to start with a solid conceptual understanding of what a materialized view (MV) actually is. There are three elements to an MV...
It is also important to grasp the limitations of MVs, which vary widely by Oracle version. When you have understood the nature of MVs it is but a short step to the incorrect assumption that a given MV ought to be capable of a particular function. That function may very well be available in the next major or minor release of Oracle, but in your version, it might not. Indeed, it might be documented to be available in the next version, but in your particular environment there might be a bug - previously known or of your own discovery - that prevents that functionality from operating as expected. Or at all.
Maybe the most important realization is that MVs are enormously complex when compared to other features, such as bitmap indexes, and in conformance with the previously outlined principle that "simple equals good, and complex equal bad" that means that they must be approached with some degree of caution.
Where We Can Use Them
In essence, there are three situations in which we might want to make the same set of information available in multiple places.
The first of these is in summary tables. We provide summary tables in a data warehouse because not every query is going to need the very detailed information that our fact tables may provide, and because allowing high‑level queries to scan a much smaller pre‑aggregated data set reduces our i/o burden.
The second situation is in dimension tables. The dimension tables contain lists of all the values in the key columns of the fact tables, and maybe some more information on those values such as descriptive text for codes.
The third situation is in the fact tables themselves. I will admit this is a bit of a stretch for many people, but the fact tables are generally the cleansed and transformed versions of some kind of source data set. That data set may be a set of OLTP database tables, or a set of flat files, but the fact tables are still in some ways a duplication of the original data set.
Materialized Views for Summary Tables
A summary can be a very simple object to define. At their most simple they provide a restricted list of the key columns of a fact table and an aggregation - generally a SUM() -- of some or all of the metric columns. Therefore, a fact table with seven dimensional keys and six metrics can be aggregated to a summary table of four dimensional keys plus two metrics. In the process, the table becomes "shorter" as well as "narrower," and one million rows may be reduced to thirty thousand - an aggregation ratio of about 33.
A more advanced form of summary table might include an aggregation on one of the dimensional keys. A fact table with a dimensional key at the "time" level (eg. "03‑Jan‑2004 10:51:22") may be summarized as in the example above, but with the "time" key aggregated to an "hour", "day", or "month" level, thus increasing the aggregation ratio.
These two forms might even be combined, with an original "store" key included in the summary with the addition of a higher "region" key to allow faster region-based access to the table.
There are two main features of MVs that make them an attractive option for providing summary tables -- query rewrite and fast refresh.
Query Rewrite for Summary Tables
Query rewrite allows the optimizer to redefine a SQL select statement directed at a fact table so that it addresses a summary table instead. It does this by comparing the relative costs of providing the result from the original table with the costs of providing the result from one of many available MVs, so needless to say, the cost-based optimizer (CBO) is a must. In fact, the CBO is essential for nearly all data warehouse-specific features, so if you are one of those rule-based optimizer fans you will just have to suck-it-up and make the change.
Although there are many hoops to be jumped through in configuring query rewrite, at the instance level as well as the schema level, once it is up and running the functionality is robust, and unlikely to cause sudden "feature-unexpectedly-stopped-working" surprises on a busy Monday morning, providing that you maintain good statistics on the fact and summary tables and partitions.
Fast Refresh for Summary Tables
Fast refresh is an essential mechanism for the maintenance of MVs. It allows MVs to be kept current with respect to the master fact table by making as few changes as possible to the MV, in contrast to the complete refresh, which would require that the entire master fact table be scanned and re‑aggregated every time it is modified.
The heavily partition‑oriented methodology that Oracle suggests for maintaining fact tables implies that MVs ought to be responsive to change at the partition level of the fact table, and indeed Oracle offers Partition Change Tracking (PCT), at various levels of capability and maturity dependent upon version, to allow this.
PCT offers the ability to associate summary table rows with the fact table partition from which they were sourced, for the purpose of both fast refresh and query rewrite capabilities. Among other restrictions (documented in the Oracle Data Warehousing Guide for your version), either the partition key of the fact table or a column based on the Oracle-supplied DBMS_MVIEW.PMARKER must be included in the SELECT list for the MV.
As for query rewrite, there are some hoops to be jumped through for an MV to be fast‑refreshable, and even more for it to be fast-refreshable through PCT.
It's all very seductive stuff, this ability of your summary tables to be self-maintaining - change the fact table, and they look after themselves. However, there are some issues that make it not-quite-so clear cut, and in my view, there are some disadvantages that need to be considered before we leap headlong into this functionality.
In the next article, I will continue the topic of Materialized Views with some of their disadvantages.