OLAP and Data Warehousing – Data Warehousing Solution Architecture (Part 2)


Analysis is usually accomplished with tools that are built on the base of attendant
technologies. Let’s briefly describe two of the most popular technologies – Online
Analysis Processing (OLAP – the online operating analysis) and Data Mining (mining
for hidden information dependences and tendencies).

OLAP

The resources of the operating analysis are based on the concept of
a multivariate data presentation. Really, each numeric value, inclusive
in storage, have up to ten attributes (for example, quantity of
sales by a certain manager in a certain region for a certain date). Thus, we
can consider that we deal with multivariate data structures – multivariate
cubes. So, based on this allowance, all OLAP software are build.

OLAP systems usually have two distinctive abilities:

  • The analysis is carried out above multivariate data structures:
    OLAP — cubes (the “cube” metaphor provides a new approach to visualizing
    how data are organized. A cube gives the impression of multiple dimensions)
  • There is, built-in into OLAP software, the ability to navigate
    on OLAP-cubes — (so-called OLAP — manipulation)

OLAP systems are usually realized based on (department) data marts, rather
than (enterprise-wide) data warehouses — because of the many possible attributes.
The multivariate structure (OLAP-cube) is created on the base of the relational
DBMS with the additional access interface (MRDB – MultiRelational Database),
or with usage of a special multivariate DBMS (MDDB – MultiDimensional Database).

The multivariate data model allows for storage of measurable values (or measures)
in the uniform data object with parameters, defining their values or arguments.
Most of the measurements (parameters from which the measures depend)
are inhomogeneous, for example if we speak about the specific product, it is also possible to speak about some product groups and / or product
types. Thus, measurement “product” can have a naming hierarchy: «product
types» -> «product group» -> «product». The
data analysis and/or the decision maker, infrequently need to work with numeric
values specific to a concrete product, so «product group» or
even «product types» levels are sufficient. For fast access
support to this high-level information, data warehouse as a rule contains
this information already computed. The process of such calculation is called
aggregation and the computed values themselves, aggregates.

For data warehouses/data marts creation, ordinary relational databases
frequently will be used. This case storage database scheme is organized as
«starflake» (a schema is called a star schema if all the dimension
tables can be joined directly to the fact table) or «snowflake»
(a schema is called a snowflake schema if one or more dimension tables
do not join directly to the fact table but must join through other dimension
tables). For example, in the case of a «star» database scheme organization,
there is a fact table (measures table), which contains one or several
numeric columns that measure business performance (for example: revenue,
costs, number of products sold, number of shipments), and a number of dimension
tables
, which describe the facts (for example; if our fact is the number of
products sold, the dimensions might be time, location and/or region and
product).

Data Mining

Data Mining is the detection process for hidden tendencies, trends and
cross-correlations in DW-storage data with the use of artificial
intellect tools and software and/or statistical analysis. The data mining
technology literally emerged over the last few years and is now on top of its development.
The given technology includes a wide spectrum of methodologies from statistical
analysis methods to neuron networks usage.

The main purpose of data mining is to detect dependencies and tendencies that are
inconspicuous at first (human) view and to analyze and monitor these tendencies with a focused attention.
Thus, data mining resources and tools allow us not only to parse
the data, but also to present it in a way that the tendencies and dependencies
are obviously monitored on an ongoing basis.

In Summary

The crucial terms for DW-project are a data warehouse, a data
mart
, data warehousing, and data mining.

A data warehouse is the cohesive data model that defines the
central data repository for an organization. An important point is that
we don’t define a warehouse in terms of the number of databases. Instead,
we consider it a complete, integrated data model of the enterprise, regardless
of how or where the data is stored.

A data mart is a repository containing data specific to a particular
business group (company departments, for example) in an enterprise. All
data in a data mart derives from the data warehouse, and all data relates
directly to the enterprise-wide data model. Often, data marts contain summarized
or aggregated data that the user community can easily consume.

 


Finally, data warehousing is the process of managing the data warehouses
and data marts. This process includes all the ongoing support needs of
the refresh cycle, database maintenance, and continual refinements to the
underlying data model.

Data mining is a technology that applies sophisticated and complex
algorithms to analyze data and expose significant information for analysis
by decision makers. Whereas OLAP organizes data in a model suited
for exploration by analysts, data mining performs analysis on data and
provides the results to decision makers. Thus, OLAP supports model-driven
analysis and data mining supports data-driven analysis.

Related Article: Introduction to Analysis Services: Creating Your First Cube from William E. Pearson




See All Articles by Columnist
Alexzander Nepomnjashiy

Alexzander Nepomnjashiy
Alexzander Nepomnjashiy
I am a Microsoft SQL Server Database Designer for Neo-Systems North-West - a security services, consulting, and training company. I have over eight years of experience in the IT field. I am currently working on several projects which involve the deployment of Microsoft Windows NT Server/Microsoft SQL Server within an enterprise business/financial environment. My typical role in these projects includes extending and improving our clients' corporate ERP systems to manage retail sales data, predict market changes and calculate trends for future market situations (DSS, OLAP). Also among my responsibilities are the design and administration of Microsoft SQL Server 7.0/2000 databases. I am available to work on a contract basis for the following types of projects: - Technical authoring, including books, articles, and white papers; - Network and systems design and analysis; - Database and software development and analysis; - Short-term consulting projects. I hope you find these articles useful. If you have any ideas for future articles (in a field of Microsoft SQL Server databases design, administration, performance optimization), or if you have anything to say about the ones below, please do not hesitate to contact me! Feel free to forward these articles to all interested associates. Thank You!

Latest Articles