Analysis is usually accomplished with tools that are built on the base of attendant
technologies. Let's briefly describe two of the most popular technologies - Online
Analysis Processing (OLAP - the online operating analysis) and Data Mining (mining
for hidden information dependences and tendencies).
The resources of the operating analysis are based on the concept of
a multivariate data presentation. Really, each numeric value, inclusive
in storage, have up to ten attributes (for example, quantity of
sales by a certain manager in a certain region for a certain date). Thus, we
can consider that we deal with multivariate data structures - multivariate
cubes. So, based on this allowance, all OLAP software are build.
OLAP systems usually have two distinctive abilities:
- The analysis is carried out above multivariate data structures:
OLAP -- cubes (the "cube" metaphor provides a new approach to visualizing
how data are organized. A cube gives the impression of multiple dimensions)
- There is, built-in into OLAP software, the ability to navigate
on OLAP-cubes -- (so-called OLAP -- manipulation)
OLAP systems are usually realized based on (department) data marts, rather
than (enterprise-wide) data warehouses -- because of the many possible attributes.
The multivariate structure (OLAP-cube) is created on the base of the relational
DBMS with the additional access interface (MRDB - MultiRelational Database),
or with usage of a special multivariate DBMS (MDDB - MultiDimensional Database).
The multivariate data model allows for storage of measurable values (or measures)
in the uniform data object with parameters, defining their values or arguments.
Most of the measurements (parameters from which the measures depend)
are inhomogeneous, for example if we speak about the specific product, it is also possible to speak about some product groups and / or product
types. Thus, measurement "product" can have a naming hierarchy: «product
types» -> «product group» -> «product». The
data analysis and/or the decision maker, infrequently need to work with numeric
values specific to a concrete product, so «product group» or
even «product types» levels are sufficient. For fast access
support to this high-level information, data warehouse as a rule contains
this information already computed. The process of such calculation is called
aggregation and the computed values themselves, aggregates.
For data warehouses/data marts creation, ordinary relational databases
frequently will be used. This case storage database scheme is organized as
«starflake» (a schema is called a star schema if all the dimension
tables can be joined directly to the fact table) or «snowflake»
(a schema is called a snowflake schema if one or more dimension tables
do not join directly to the fact table but must join through other dimension
tables). For example, in the case of a «star» database scheme organization,
there is a fact table (measures table), which contains one or several
numeric columns that measure business performance (for example: revenue,
costs, number of products sold, number of shipments), and a number of dimension
tables, which describe the facts (for example; if our fact is the number of
products sold, the dimensions might be time, location and/or region and
Data Mining is the detection process for hidden tendencies, trends and
cross-correlations in DW-storage data with the use of artificial
intellect tools and software and/or statistical analysis. The data mining
technology literally emerged over the last few years and is now on top of its development.
The given technology includes a wide spectrum of methodologies from statistical
analysis methods to neuron networks usage.
The main purpose of data mining is to detect dependencies and tendencies that are
inconspicuous at first (human) view and to analyze and monitor these tendencies with a focused attention.
Thus, data mining resources and tools allow us not only to parse
the data, but also to present it in a way that the tendencies and dependencies
are obviously monitored on an ongoing basis.
The crucial terms for DW-project are a data warehouse, a data
mart, data warehousing, and data mining.
A data warehouse is the cohesive data model that defines the
central data repository for an organization. An important point is that
we don't define a warehouse in terms of the number of databases. Instead,
we consider it a complete, integrated data model of the enterprise, regardless
of how or where the data is stored.
A data mart is a repository containing data specific to a particular
business group (company departments, for example) in an enterprise. All
data in a data mart derives from the data warehouse, and all data relates
directly to the enterprise-wide data model. Often, data marts contain summarized
or aggregated data that the user community can easily consume.
Finally, data warehousing is the process of managing the data warehouses
and data marts. This process includes all the ongoing support needs of
the refresh cycle, database maintenance, and continual refinements to the
underlying data model.
Data mining is a technology that applies sophisticated and complex
algorithms to analyze data and expose significant information for analysis
by decision makers. Whereas OLAP organizes data in a model suited
for exploration by analysts, data mining performs analysis on data and
provides the results to decision makers. Thus, OLAP supports model-driven
analysis and data mining supports data-driven analysis.
Related Article: Introduction to Analysis Services: Creating Your First Cube from William E. Pearson
See All Articles by Columnist Alexzander Nepomnjashiy