OLAP and Data Warehousing – Data Warehousing Technology Review (Part 1)

Welcome to the latest article in my OLAP and Data Warehousing (The Problem and Solution) series, a series dedicated to the discussion of OLAP and Data Warehousing technologies. Let’s return once again in more detail to reasons of crisis involving the operating
analysis. Below we present only the primary ones:

  • Highly detailed data in OLTP systems;

  • Data is dispersed among several OLTP systems. This fact has some aspects:

    • Various data sources,

    • Dissimilar data formats,

    • Various data access mechanisms,

    • A large number of independent reports generation tools and the potential lack
      of integrity among several OLTP systems (at least data in different systems
      may be refreshed with a different regularity, so even if each system is supported
      in a completely integrated state, group integrity may be broken, which may
      distort an otherwise common picture);

  • Low speed of reports generation. As OLTP systems are oriented first to
    effective transaction processing, the generation of «large reports»
    usually takes a significant amount of time.

There are several paths to overcome these crises. One of them is the creation of
gateways among separate OLTP systems. However, this results only in consolidating
several OLTP systems into one, and in any case does not solve «highly
detailed data» problems.

An alternate approach was generated in the last decade as a group of
coupled technologies that received the name Data Warehousing (the technology of
Data Warehouses construction).

Data Warehouse, according to the classical definition, is a collection of
resources that permit the presentation of data in a complete, subject-oriented view
suitable for analysis and acceptance of business solutions.

The construction of Data Warehouses allows us to make a step to the following
stage of business activity automation — creating resources and tools to
support decision making. The main difference between decision making activity
and performing daily activity, from the view point of the used data, is
the requirement for an all-encompassing vision of processes for the extensive diversity of
parameters on which they depend, for various, arbitrary time intervals.
It is possible to say that the performers work with data on occurring processes,
whereas for managers information is necessary for decision making.
This fact defines type of used data. For the creation of decision support systems
(DSS) complete, consistent information for various
time intervals
which may be both generalized (sum or aggregated
different way) and detailed is necessary. This is the main conception of Data
Warehouses as a platform for decision support systems construction.

There are three groups of tools that you will probably involve in the decision-making process:

  • Report generation tools,

  • Operating analysis tools (OLAP), and

  • Data Mining tools

Report generation tools are intended for presenting data as tables and
diagrams, although other forms of data presentation are also used (for example, diagrams
plotted on a georaphic map). This class of tools allows managers to inspect
occurring processes, having some fixed sights on this processes metrics.

OLAP tools are intended for hypothesis testing, and they allow one to find the
data which confirm or refute the formulated management hypotheses. Hypotheses
may be formulated very definitely (was the profit falling directly as a result
of the cost price increases?) or more indistinct (are there any parameters
which most strongly differentiate the division that brought the greatest profit
from the other divisions?). This type of information allows managers to change
company business processes to reach definite purposes.

OLAP (On-Line Analytical Processing) tools are the key component/platform for Data Warehouses building. This technology (OLAP) is based on
the construction of many-dimensional data sets, called OLAP-cubes, where
axes contain parameters and the cells dependent on them, aggregated data.

Data Mining tools are intended for hypotheses creation on the existing
data. This class of tools most strongly depends on data domain and the structure
of input data. However, the use of similar tools is necessary in case of
large data volumes with numerous parameters, on which these data depend,
since they allow one to detect (or in other words to make visible)
the facts and tendencies which are completely unevident after a typical (brieft) review
of huge data arrays.

Specific forms of data usage (against transaction processing in OLTP
databases) cause appropriate requirements to used storage and data presentation
models. OLTP databases are optimized as much as possible for the presentation of
a small part of all company data, and for targeting transactions execution with
the highest possible performance, Data Warehouses typically present information
on all interconnected company processes. Also, DW and OLTP databases have
completely different types of queries to them. Target transactions (for
OLTP) combine not only queries for selecting data but also procedures
for data modification and the addition of new ones. In the case of Data Warehouses, we
deal first and foremost with data selection, as most of queries to Data Warehouses
are queries on selecting data.

On to Part Two of this article.

See All Articles by Columnist
Alexzander Nepomnjashiy

Alexzander Nepomnjashiy
Alexzander Nepomnjashiy
I am a Microsoft SQL Server Database Designer for Neo-Systems North-West - a security services, consulting, and training company. I have over eight years of experience in the IT field. I am currently working on several projects which involve the deployment of Microsoft Windows NT Server/Microsoft SQL Server within an enterprise business/financial environment. My typical role in these projects includes extending and improving our clients' corporate ERP systems to manage retail sales data, predict market changes and calculate trends for future market situations (DSS, OLAP). Also among my responsibilities are the design and administration of Microsoft SQL Server 7.0/2000 databases. I am available to work on a contract basis for the following types of projects: - Technical authoring, including books, articles, and white papers; - Network and systems design and analysis; - Database and software development and analysis; - Short-term consulting projects. I hope you find these articles useful. If you have any ideas for future articles (in a field of Microsoft SQL Server databases design, administration, performance optimization), or if you have anything to say about the ones below, please do not hesitate to contact me! Feel free to forward these articles to all interested associates. Thank You!

Latest Articles