OLAP and Data Warehousing - Data Warehousing Technology Review (Part 1)

June 26, 2002

Welcome to the latest article in my OLAP and Data Warehousing (The Problem and Solution) series, a series dedicated to the discussion of OLAP and Data Warehousing technologies. Let's return once again in more detail to reasons of crisis involving the operating analysis. Below we present only the primary ones:

  • Highly detailed data in OLTP systems;
  • Data is dispersed among several OLTP systems. This fact has some aspects:
    • Various data sources,
    • Dissimilar data formats,
    • Various data access mechanisms,
    • A large number of independent reports generation tools and the potential lack of integrity among several OLTP systems (at least data in different systems may be refreshed with a different regularity, so even if each system is supported in a completely integrated state, group integrity may be broken, which may distort an otherwise common picture);
  • Low speed of reports generation. As OLTP systems are oriented first to effective transaction processing, the generation of «large reports» usually takes a significant amount of time.
There are several paths to overcome these crises. One of them is the creation of gateways among separate OLTP systems. However, this results only in consolidating several OLTP systems into one, and in any case does not solve «highly detailed data» problems.

An alternate approach was generated in the last decade as a group of coupled technologies that received the name Data Warehousing (the technology of Data Warehouses construction).

Data Warehouse, according to the classical definition, is a collection of resources that permit the presentation of data in a complete, subject-oriented view suitable for analysis and acceptance of business solutions.

The construction of Data Warehouses allows us to make a step to the following stage of business activity automation -- creating resources and tools to support decision making. The main difference between decision making activity and performing daily activity, from the view point of the used data, is the requirement for an all-encompassing vision of processes for the extensive diversity of parameters on which they depend, for various, arbitrary time intervals. It is possible to say that the performers work with data on occurring processes, whereas for managers information is necessary for decision making. This fact defines type of used data. For the creation of decision support systems (DSS) complete, consistent information for various time intervals which may be both generalized (sum or aggregated different way) and detailed is necessary. This is the main conception of Data Warehouses as a platform for decision support systems construction.

There are three groups of tools that you will probably involve in the decision-making process:

  • Report generation tools,
  • Operating analysis tools (OLAP), and
  • Data Mining tools
Report generation tools are intended for presenting data as tables and diagrams, although other forms of data presentation are also used (for example, diagrams plotted on a georaphic map). This class of tools allows managers to inspect occurring processes, having some fixed sights on this processes metrics.

OLAP tools are intended for hypothesis testing, and they allow one to find the data which confirm or refute the formulated management hypotheses. Hypotheses may be formulated very definitely (was the profit falling directly as a result of the cost price increases?) or more indistinct (are there any parameters which most strongly differentiate the division that brought the greatest profit from the other divisions?). This type of information allows managers to change company business processes to reach definite purposes.

OLAP (On-Line Analytical Processing) tools are the key component/platform for Data Warehouses building. This technology (OLAP) is based on the construction of many-dimensional data sets, called OLAP-cubes, where axes contain parameters and the cells dependent on them, aggregated data.

Data Mining tools are intended for hypotheses creation on the existing data. This class of tools most strongly depends on data domain and the structure of input data. However, the use of similar tools is necessary in case of large data volumes with numerous parameters, on which these data depend, since they allow one to detect (or in other words to make visible) the facts and tendencies which are completely unevident after a typical (brieft) review of huge data arrays.

Specific forms of data usage (against transaction processing in OLTP databases) cause appropriate requirements to used storage and data presentation models. OLTP databases are optimized as much as possible for the presentation of a small part of all company data, and for targeting transactions execution with the highest possible performance, Data Warehouses typically present information on all interconnected company processes. Also, DW and OLTP databases have completely different types of queries to them. Target transactions (for OLTP) combine not only queries for selecting data but also procedures for data modification and the addition of new ones. In the case of Data Warehouses, we deal first and foremost with data selection, as most of queries to Data Warehouses are queries on selecting data.

On to Part Two of this article.

See All Articles by Columnist Alexzander Nepomnjashiy