Welcome to the latest article in my OLAP and Data Warehousing (The Problem and Solution) series, a series dedicated to the discussion of OLAP and Data Warehousing technologies. Let’s return once again in more detail to reasons of crisis involving the operating
analysis. Below we present only the primary ones:
Highly detailed data in OLTP systems;
Data is dispersed among several OLTP systems. This fact has some aspects:
Various data sources,
Dissimilar data formats,
Various data access mechanisms,
A large number of independent reports generation tools and the potential lack
of integrity among several OLTP systems (at least data in different systems
may be refreshed with a different regularity, so even if each system is supported
in a completely integrated state, group integrity may be broken, which may
distort an otherwise common picture);
Low speed of reports generation. As OLTP systems are oriented first to
effective transaction processing, the generation of «large reports»
usually takes a significant amount of time.
There are several paths to overcome these crises. One of them is the creation of
gateways among separate OLTP systems. However, this results only in consolidating
several OLTP systems into one, and in any case does not solve «highly
detailed data» problems.
An alternate approach was generated in the last decade as a group of
coupled technologies that received the name Data Warehousing (the technology of
Data Warehouses construction).
Data Warehouse, according to the classical definition, is a collection of
resources that permit the presentation of data in a complete, subject-oriented view
suitable for analysis and acceptance of business solutions.
The construction of Data Warehouses allows us to make a step to the following
stage of business activity automation — creating resources and tools to
support decision making. The main difference between decision making activity
and performing daily activity, from the view point of the used data, is
the requirement for an all-encompassing vision of processes for the extensive diversity of
parameters on which they depend, for various, arbitrary time intervals.
It is possible to say that the performers work with data on occurring processes,
whereas for managers information is necessary for decision making.
This fact defines type of used data. For the creation of decision support systems
(DSS) complete, consistent information for various
time intervals which may be both generalized (sum or aggregated
different way) and detailed is necessary. This is the main conception of Data
Warehouses as a platform for decision support systems construction.
There are three groups of tools that you will probably involve in the decision-making process:
Report generation tools,
Operating analysis tools (OLAP), and
Data Mining tools
Report generation tools are intended for presenting data as tables and
diagrams, although other forms of data presentation are also used (for example, diagrams
plotted on a georaphic map). This class of tools allows managers to inspect
occurring processes, having some fixed sights on this processes metrics.
OLAP tools are intended for hypothesis testing, and they allow one to find the
data which confirm or refute the formulated management hypotheses. Hypotheses
may be formulated very definitely (was the profit falling directly as a result
of the cost price increases?) or more indistinct (are there any parameters
which most strongly differentiate the division that brought the greatest profit
from the other divisions?). This type of information allows managers to change
company business processes to reach definite purposes.
OLAP (On-Line Analytical Processing) tools are the key component/platform for Data Warehouses building. This technology (OLAP) is based on
the construction of many-dimensional data sets, called OLAP-cubes, where
axes contain parameters and the cells dependent on them, aggregated data.
Data Mining tools are intended for hypotheses creation on the existing
data. This class of tools most strongly depends on data domain and the structure
of input data. However, the use of similar tools is necessary in case of
large data volumes with numerous parameters, on which these data depend,
since they allow one to detect (or in other words to make visible)
the facts and tendencies which are completely unevident after a typical (brieft) review
of huge data arrays.
Specific forms of data usage (against transaction processing in OLTP
databases) cause appropriate requirements to used storage and data presentation
models. OLTP databases are optimized as much as possible for the presentation of
a small part of all company data, and for targeting transactions execution with
the highest possible performance, Data Warehouses typically present information
on all interconnected company processes. Also, DW and OLTP databases have
completely different types of queries to them. Target transactions (for
OLTP) combine not only queries for selecting data but also procedures
for data modification and the addition of new ones. In the case of Data Warehouses, we
deal first and foremost with data selection, as most of queries to Data Warehouses
are queries on selecting data.
On to Part Two of this article.