Please find below the follow-up to the first part in my OLAP
and Data Warehousing (The Problem and Solution) series, a series dedicated
to the discussion of OLAP and Data Warehousing technologies.
A lack of world economic stabilization over the last few years has resulted in
a growth of competitiveness and a rise in importance of correct solutions
for successful company operation. Modern business requires, at a minimum, from top-managers responsible for decision making:
Exntensive knowledge of the company’s most significant clients;
Knowledge of:
Last 2… 5… 10…;
Most important strategies;
Most successful solutions;
Unsuccessful bargains;
Analysis of income levels;
Analysis of sales dynamics.
The modern manager who wants to achieve success needs to completely understand
the nature of the processes he or she controls. And by understand, I mean understand at a point in time not after financial
activity has obstinately proved an inefficiency, but rather when there still
remains the possibility of making the process effective. A unique method
to carry out efficient control and parse company activity (based on available
data and information instead of certain suppositions tat are frequently far from the
truth) asks about the value of those or other metrics and tries to answer
the question: « Why are those so important? ».
All this knowledge cannot be received without the analysis of large volumes
of raw data which have been accumulated during the history of company existence.
Of course, you might ask whether each organization owns / has such data
for the analysis? The answer is most likely YES!
In my experience, almost all organizations have used (it was folded
historically) recording systems, among which have been tracked:
Technologies, databases, and systems;
Specializations:
Accounting;
Economic;
Administrative systems;
. . .
However, the usage of such separated systems, as a rule, were limited to the description
of company activity in some isolated fields, with each record fixed in such
separate systems. Such systems are in the class of online
transaction processing systems (OLTP). You can also find them by another
name, operating analysis, that underlines their main role storing
raw data. Raw in this case means data of a very low level of abstraction.
Similar objectives (i.e. obtaining information from poorly
agreed systems with subsequent analysis) will demand huge and completely unjustified
costs of those people whose time and compensation is very expensive (i.e. top managers).
I dare to assume that none of them will risk to undertake this
operation.
So, what we watch for are available crises of the operating analysis
or so-called DIJ (Data In Jail) crises. The reason behind this is that regardless of all the abundance
of data, the people responsible for decision-making are not capable of
extracting information from data and require knowledge about
processes existent around them — mainly because they deal with too detailed, non-interconnected data that
represents isolated company activities.
Thus, on a currently presented technological level the level of automation
of the analysis and decision-making process is extremely low. No wonder then that
some tendencies remain either unnoticed or misunderstood.
So the operating analysis, which in fact should be the dynamic, iterative
and constant process of critical learning about company activity really is only
fulfilled from time to time (and most likely by company IT-staff).
Correct administrative decisions are impossible without having the information, usually quantitative,
necessary for them. For this purpose we have Data Warehouse (DW) creation, the process of collecting, sifting and preprocessing
raw data into resulting information, appropriated for a
statistical analysis (frequently also for analytical reporting).
Ralph Kimball, father of DW concepts, describes DW as “a place where
people may gain access to the data” (according to Ralph Kimball, “The Data
Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses”,
John Wiley and Sons, 1996 and “The Data Webhouse Toolkit: Building the
Web-Enabled Data Warehouse”, John Wiley and Sons, 2000). Kimball also formulates
the main requirements of DW:
High speed data acquisition support (from the DW storage);
Internal data consistency checking support;
Possibility of obtaining and matching so-called “cuts of data” (“slice
and dice” procedure);
Availability of convenient utilities of review of the data in DW storage;
Entirety and reliability of the stored data;
Support of the qualitative process of addition of the data.
Typical DW, as a rule, differs from the ordinary relational OLTP database:
OLTP databases are intended to help users perform daily business operation
whereas DW is intended for decision making. For example, the sale of goods
made with usage of the database intended for transaction processing, analysis
of sales dynamics – with the help of DW;
OLTP databases are subject to constant changes during daily user operation,
but DW is rather stable: DW data is usually updated according to schedule
(for example, weekly, daily or hourly – depending on business requirements).
Ideally, the process of data addition to DW should be the simple addition of
new data (from different operating systems / sources) for the defined
period of time without changing the information already placed in storage;
OLTP databases serve as data sources for DW.