Data Warehouse is not a software product but is instead the architecture used
for enterprise-level business intelligence information systems organization.
Data Warehouse storage (database) is a central part of such systems; however,
a full-function Data Warehouse solution consists of complementary products
and technologies. The minimal typical components of a decision support system
- Storage database. Usually organized on an DBMS platform.
As the sizes of storage can reach hundreds gigabyte, the DBMS should support
the VLDB (Very Large Databases) technology.
- Loading data tools. Tools for data acquisition, clearing,
aggregation, transformation, unifying and synchronization. This class of
tools is responsible for transforming data stored in OLTP-systems into
integrated and interdependent information as well as saving this information
in the storage database.
Data acquisition is the task of bringing data from everywhere to the
data warehouse. Most businesses have several operational systems that handle
the organization's day-to-day processing. These systems serve as the data
source for the warehouse. The systems might reside on a mainframe, in a
client/server database application, in a third-party application with a
proprietary data store, within desktop applications such as spreadsheets
and database applications, or any combination of these. The challenge is
to identify the data sources and develop a solution for extracting and
delivering the data to the warehouse in a timely, scheduled manner.
After collecting the data, you need to transform it. In an ideal organization,
all systems would use the same set of codes and definitions for all data
elements. Data transformation is the cleansing and validation of data for
accuracy and ensuring that all values conform to a standard definition.
After these data transformation tasks are complete, you can add the data
to the warehouse
ETL tools should be reliable and efficient. They should also have known
flexibility, submitting the possibility of configuration and programming
with usage of transformation script language. One more essential
requirement to these tools -- the possibility to access a broad range of data
sources usually provided by the general purpose data access interfaces such as ODBC, OLE DB, JDBC, etc. The industrial software giants
only recently have begun to release products possessing the described
functionality; therefore, for many years it was necessary to use proprietary
solutions oriented at the concrete system, or customized products from small companies such as Prism Solutions, ETI, Carleton and similar. Products released recently from large corporations include Visual Warehouse (from
IBM) and Data Transformation Services (from Microsoft).
- Repository (metadata storage). Metadata - «the
data about the data», which describe the information introduced
in storage. This information is indispensable both for maintaining storage
integrity and for storing information about storage itself. Another definition
for metadata term -- «information that describes database
informational filling». For example, metadata informs users when storage
data was last refreshed, its (data) format and for what is
supposed to be applied. Also, this information helps analysts, working
with the storage, to understand financial data value and context, database
records and business transactions. Let's separate all the metadata
on two large groups:
- Technical metadata - The information typically used by storage creators and/or managers.
Technical metadata can include, for example: information about storage data sources, ETL (Extract
Transform Load) scripts, storage integrity support procedures, access rights
- Analytical metadata - The information which storage analysts use. Analytical metadata can include, for example: typical queries to storage, data and time formats, references to external information sources (WWW-links / Web pages), etc.
The repository provides information about the location and nature of
operational data, which is critical in the construction of a data warehouse.
It acts as a guide to the warehouse data, storing information necessary
to define the migration environment, mappings of sources to targets, translation
requirements, business rules, and selection criteria to build the warehouse.
- Data Marts. There are many different views on what
Data Marts are used for. A Data Mart can be simply a logically allocated storage part,
including the information about the data of a definite data domain. But from
the other side, a Data Mart can be a physically allocated storage part. Thus,
such Data Mart in the information flow chain can stand «before»
main storage -- as it is an information source, or can stand «behind» it
-- as one of storage data «customer». In the first case, the data is
first gathered in «small», physically separated storages
- data marts, at company departments level (for example), then placed in
«big» enterprise storage. In the second case, main (enterprise)
storage data marts are built with the purpose of simplifying working with
the data or for storage load balancing.
See All Articles by Columnist Alexzander Nepomnjashiy