OLAP and Data Warehousing - Data Warehousing Technology Review (Part 1)June 26, 2002
An alternate approach was generated in the last decade as a group of
coupled technologies that received the name Data Warehousing (the technology of
Data Warehouses construction).
Data Warehouse, according to the classical definition, is a collection of
resources that permit the presentation of data in a complete, subject-oriented view
suitable for analysis and acceptance of business solutions.
The construction of Data Warehouses allows us to make a step to the following
stage of business activity automation -- creating resources and tools to
support decision making. The main difference between decision making activity
and performing daily activity, from the view point of the used data, is
the requirement for an all-encompassing vision of processes for the extensive diversity of
parameters on which they depend, for various, arbitrary time intervals.
It is possible to say that the performers work with data on occurring processes,
whereas for managers information is necessary for decision making.
This fact defines type of used data. For the creation of decision support systems
(DSS) complete, consistent information for various
time intervals which may be both generalized (sum or aggregated
different way) and detailed is necessary. This is the main conception of Data
Warehouses as a platform for decision support systems construction.
There are three groups of tools that you will probably involve in the decision-making process:
OLAP tools are intended for hypothesis testing, and they allow one to find the
data which confirm or refute the formulated management hypotheses. Hypotheses
may be formulated very definitely (was the profit falling directly as a result
of the cost price increases?) or more indistinct (are there any parameters
which most strongly differentiate the division that brought the greatest profit
from the other divisions?). This type of information allows managers to change
company business processes to reach definite purposes.
OLAP (On-Line Analytical Processing) tools are the key component/platform for Data Warehouses building. This technology (OLAP) is based on
the construction of many-dimensional data sets, called OLAP-cubes, where
axes contain parameters and the cells dependent on them, aggregated data.
Data Mining tools are intended for hypotheses creation on the existing
data. This class of tools most strongly depends on data domain and the structure
of input data. However, the use of similar tools is necessary in case of
large data volumes with numerous parameters, on which these data depend,
since they allow one to detect (or in other words to make visible)
the facts and tendencies which are completely unevident after a typical (brieft) review
of huge data arrays.
Specific forms of data usage (against transaction processing in OLTP
databases) cause appropriate requirements to used storage and data presentation
models. OLTP databases are optimized as much as possible for the presentation of
a small part of all company data, and for targeting transactions execution with
the highest possible performance, Data Warehouses typically present information
on all interconnected company processes. Also, DW and OLTP databases have
completely different types of queries to them. Target transactions (for
OLTP) combine not only queries for selecting data but also procedures
for data modification and the addition of new ones. In the case of Data Warehouses, we
deal first and foremost with data selection, as most of queries to Data Warehouses
are queries on selecting data.
On to Part Two of this article.
|