A Practical Guide to Data Warehousing in Oracle, Part 2
January 29, 2004
Some of the major differences from OLTP systems that you will encounter in data warehousing are the types of tables.
The Oracle Data Warehousing Guide names and describes two major table types - facts, and dimensions, and if you were not sure of the difference, it would be worthwhile having a look at that document before we continue.
In this article, we will be looking at various possible attributes of tables, and seeing what use we can make of them in different table types of a data warehouse.
There are few rules here, partly because of the possible differences in requirements for each warehouse, partly because of differences in Oracle versions, but mostly because any person who tries to build a database using a simple set of rules is an idiot. Each of these attributes has to be understood before it is used, and intelligent thought has to go into their application.
There are also not too many attributes that you need to think about.
This is the percentage of free space to be reserved for updates to existing rows in each data block. When an insert of a new row into a block would reduce the amount of free space to less than "pctfree", that insert is disallowed and the block is no longer considered to be a candidate for a new row.
Getting the wrong value of pctfree will impact performance. If you set the value too high, then you are preventing the data blocks from holding as many rows as they could and thus reducing the rows fetched per i/o operation. If you set the value too low, then Oracle might find that there is not enough space for rows to be lengthened - the row will be moved to a new block, and a pointer to the new location left in the old block. This "row migration" will again decrease the efficiency of i/o operations, harming performance.
As far as fact tables are concerned, it is very unlikely that you will want to modify any existing rows in the table. Therefore "pctfree=0" is advised, so that each fetch of a block of the fact table retrieves as many rows as possible.
Dimension tables are more variable in their pctfree setting, however if you get it wrong it is probably not going to be such a big deal to correct. Choose what seems like a reasonable value and keep an eye on the AVG_ROW_LEN, AVG_SPACE_FREELIST_BLOCKS and CHAIN_CNT columns of dba_tables. Comparing the first and second columns will tell you if you have too much spare space in the blocks (pctfree too low), and watching for CHAIN_CNT > 0 will tell you if rows have migrated (pctfree too high).
For tables from which deletes or updates can occur you might want to spare some thought for PCTUSED. When a block has been filled up (see PctFree above) and will no longer accept new rows, it will not become a candidate for new inserts until its usage has fallen to the pctused value. You will probably never need it for fact tables, so you might as well set it to 99. For dimension tables set it to "pctfree-1."
IniTrans and MaxTrans
These attributes refer to the number of concurrent transactions that can modify a block in the table. It is unlikely that your load process is going to make any demands in this respect, and you can just omit specifying them.
Two possibilities here: - heap organization, and index organization. For fact tables there is really no question that you will be using heap organization. The are a number of reasons for this.
An index organized table forces a physical sort order on the table. While physically sorting the data in the fact table is highly desirable for reasons of optimizing certain index access operations (more about which another time), index organization is an inflexible tool. Once you have defined a primary key, it is fixed, and you are not going to be changing your sort order later.
Another reason is that data segment compression is not available with IOT's, although index compression is. Data segment compression covers the entire set of columns, where index compression just covers the indexed columns.
So, heap storage for fact tables.
Dimension tables are a little more amenable to index organization. Here are some circumstances that might affect your choice.
If you intend to create your dimension table as a materialized view based on the fact table you will probably want to avoid index organization. Primary and Unique constraints on a materialized view ought to be deferrable because Oracle does not guarantee integrity at mid-points of a fast refresh cycle. Since an IOT's primary key constraint is not deferrable, you will want to avoid them.
A large, narrow dimension table will be a reasonable candidate for index organization, based on potential space savings.
A dimension table with a synthetic primary key assigned from an ascending sequence will also be a reasonable candidate. The rightwards growth of the primary key means you do not have to worry about imbalance in the index structure after many load cycles.
Data segment compression is a relative newcomer to Oracle, and provides block-by-block compression of repeated symbols within the table's data.
Compression provides a potentially enormous saving in i/o requirements by compressing a table by a pretty impressive ratio - 8x seems to be a typically achievable figure.
Compression gets more efficient as the number of repetitions of common symbols in a data block increases. You can help maximize your compression ratio in two ways.
Firstly, use larger block sizes. A 16k block size will give better compression than an 8k block size.
Secondly, improve the clustering of values in the table by physically sorting data before inserting it into the table. If you physically order the data in your SALES fact table so that all the similar region_cd values are consecutive, then the compression algorithm is going to get a lot of hits on region_cd repetition, and thus provide good i/o savings.
There is an overhead on CPU usage when decompressing a table to read the data, but it is not worth worrying about.
Now the main issue with compression is that it is very "testable." You can take a sample data set and just test compression ratios on different block sizes and with different physical row orders. So go and try it.
Lastly, here are a couple of caveats on compression. As of 9iR2, you can neither add a new column to, nor drop a column from, a compressed table. If you have the possibility that in the future you might add new columns on to a large compressed table, and it would be inconvenient to build a new table to do so, then add a couple of spare number and varchar2 columns to the end when you create it.
I have also come across a bug or two that has caused non-compression in situations where the documentation says it should occur - specifically during CTAS and direct path insert operations. It has been necessary to move a populated table or partition to get the data compressed - test whether your own system is susceptible to this problem before relying on compression working in these circumstances.