SHARE

A Practical Guide to Data Warehousing in Oracle, Part 2

Written By

Dave Aldridge

Jan 29, 2004

6 minute read

Introduction
General Comments
Physical Attributes
Organization
Compression

Introduction

Some of the major differences
from OLTP systems that you will encounter in data warehousing are the types of
tables.

The Oracle Data Warehousing Guide
names and describes two major table types – facts, and dimensions, and if you were
not sure of the difference, it would be worthwhile having a look at that
document before we continue.

In this article, we will be
looking at various possible attributes of tables, and seeing what use we can
make of them in different table types of a data warehouse.

General Comments

There are few rules here, partly
because of the possible differences in requirements for each warehouse, partly
because of differences in Oracle versions, but mostly because any person who
tries to build a database using a simple set of rules is an idiot. Each of
these attributes has to be understood before it is used, and intelligent
thought has to go into their application.

There are also not too many
attributes that you need to think about.

Physical Attributes

PctFree

This is the percentage of free
space to be reserved for updates to existing rows in each data block. When an
insert of a new row into a block would reduce the amount of free space to less
than "pctfree", that insert is disallowed and the block is no longer
considered to be a candidate for a new row.

Getting the wrong value of
pctfree will impact performance. If you set the value too high, then you are
preventing the data blocks from holding as many rows as they could and thus
reducing the rows fetched per i/o operation. If you set the value too low, then
Oracle might find that there is not enough space for rows to be lengthened –
the row will be moved to a new block, and a pointer to the new location left in
the old block. This "row migration" will again decrease the
efficiency of i/o operations, harming performance.

As far as fact tables are concerned,
it is very unlikely that you will want to modify any existing rows in the
table. Therefore "pctfree=0" is advised, so that each fetch of a
block of the fact table retrieves as many rows as possible.

Dimension tables are more
variable in their pctfree setting, however if you get it wrong it is probably
not going to be such a big deal to correct. Choose what seems like a reasonable
value and keep an eye on the AVG_ROW_LEN, AVG_SPACE_FREELIST_BLOCKS and
CHAIN_CNT columns of dba_tables. Comparing the first and second columns will
tell you if you have too much spare space in the blocks (pctfree too low), and
watching for CHAIN_CNT > 0 will tell you if rows have migrated (pctfree too
high).

Pctused

For tables from which deletes or
updates can occur you might want to spare some thought for PCTUSED. When a
block has been filled up (see PctFree above) and will no longer accept new
rows, it will not become a candidate for new inserts until its usage has fallen
to the pctused value. You will probably never need it for fact tables, so you
might as well set it to 99. For dimension tables set it to "pctfree-1."

IniTrans and MaxTrans

These attributes refer to the
number of concurrent transactions that can modify a block in the table. It is
unlikely that your load process is going to make any demands in this respect, and
you can just omit specifying them.

Organization

Two possibilities here: – heap
organization, and index organization. For fact tables there is really no
question that you will be using heap organization. The are a number of reasons
for this.

An index organized table forces a
physical sort order on the table. While physically sorting the data in the fact
table is highly desirable for reasons of optimizing certain index access
operations (more about which another time), index organization is an inflexible
tool. Once you have defined a primary key, it is fixed, and you are not going
to be changing your sort order later.

Another reason is that data
segment compression is not available with IOT’s, although index compression is.
Data segment compression covers the entire set of columns, where index
compression just covers the indexed columns.

So, heap storage for fact tables.

Dimension tables are a little
more amenable to index organization. Here are some circumstances that might
affect your choice.

If you intend to create your
dimension table as a materialized view based on the fact table you will
probably want to avoid index organization. Primary and Unique constraints on a
materialized view ought to be deferrable because Oracle does not guarantee integrity
at mid-points of a fast refresh cycle. Since an IOT’s primary key constraint is
not deferrable, you will want to avoid them.

A large, narrow dimension table
will be a reasonable candidate for index organization, based on potential space
savings.

A dimension table with a
synthetic primary key assigned from an ascending sequence will also be a
reasonable candidate. The rightwards growth of the primary key means you do not
have to worry about imbalance in the index structure after many load cycles.

Compression

Data segment compression is a
relative newcomer to Oracle, and provides block-by-block compression of
repeated symbols within the table’s data.

Compression provides a
potentially enormous saving in i/o requirements by compressing a table by a
pretty impressive ratio – 8x seems to be a typically achievable figure.

Compression gets more efficient
as the number of repetitions of common symbols in a data block increases. You
can help maximize your compression ratio in two ways.

Firstly, use larger block sizes.
A 16k block size will give better compression than an 8k block size.

Secondly, improve the clustering
of values in the table by physically sorting data before inserting it into the
table. If you physically order the data in your SALES fact table so that all
the similar region_cd values are consecutive, then the compression algorithm is
going to get a lot of hits on region_cd repetition, and thus provide good i/o
savings.

There is an overhead on CPU usage
when decompressing a table to read the data, but it is not worth worrying
about.

Now the main issue with
compression is that it is very "testable." You can take a sample data
set and just test compression ratios on different block sizes and with
different physical row orders. So go and try it.

Lastly, here are a couple of
caveats on compression. As of 9iR2, you can neither add a new column to, nor
drop a column from, a compressed table. If you have the possibility that in the
future you might add new columns on to a large compressed table, and it would
be inconvenient to build a new table to do so, then add a couple of spare
number and varchar2 columns to the end when you create it.

I have also come across a bug or
two that has caused non-compression in situations where the documentation says
it should occur – specifically during CTAS and direct path insert operations.
It has been necessary to move a populated table or partition to get the data
compressed – test whether your own system is susceptible to this problem before
relying on compression working in these circumstances.

»

See All Articles by Columnist Dave Aldridge