OLAP and Data Warehousing – Data Warehousing Solution Architecture (Part 1)


Data Warehouse is not a software product but is instead the architecture used
for enterprise-level business intelligence information systems organization.
Data Warehouse storage (database) is a central part of such systems; however,
a full-function Data Warehouse solution consists of complementary products
and technologies. The minimal typical components of a decision support system
are:

  • Storage database. Usually organized on an DBMS platform.
    As the sizes of storage can reach hundreds gigabyte, the DBMS should support
    the VLDB (Very Large Databases) technology.
  • Loading data tools. Tools for data acquisition, clearing,
    aggregation, transformation, unifying and synchronization. This class of
    tools is responsible for transforming data stored in OLTP-systems into
    integrated and interdependent information as well as saving this information
    in the storage database.

Data acquisition is the task of bringing data from everywhere to the
data warehouse. Most businesses have several operational systems that handle
the organization’s day-to-day processing. These systems serve as the data
source for the warehouse. The systems might reside on a mainframe, in a
client/server database application, in a third-party application with a
proprietary data store, within desktop applications such as spreadsheets
and database applications, or any combination of these. The challenge is
to identify the data sources and develop a solution for extracting and
delivering the data to the warehouse in a timely, scheduled manner.

After collecting the data, you need to transform it. In an ideal organization,
all systems would use the same set of codes and definitions for all data
elements. Data transformation is the cleansing and validation of data for
accuracy and ensuring that all values conform to a standard definition.
After these data transformation tasks are complete, you can add the data
to the warehouse

ETL tools should be reliable and efficient. They should also have known
flexibility, submitting the possibility of configuration and programming
with usage of transformation script language. One more essential
requirement to these tools — the possibility to access a broad range of data
sources usually provided by the general purpose data access interfaces such as ODBC, OLE DB, JDBC, etc. The industrial software giants
only recently have begun to release products possessing the described
functionality; therefore, for many years it was necessary to use proprietary
solutions oriented at the concrete system, or customized products from small companies such as Prism Solutions, ETI, Carleton and similar. Products released recently from large corporations include Visual Warehouse (from
IBM) and Data Transformation Services (from Microsoft).

  • Repository (metadata storage). Metadata – «the
    data about the data», which describe the information introduced
    in storage. This information is indispensable both for maintaining storage
    integrity and for storing information about storage itself. Another definition
    for metadata term — «information that describes database
    informational filling». For example, metadata informs users when storage
    data was last refreshed, its (data) format and for what is
    supposed to be applied. Also, this information helps analysts, working
    with the storage, to understand financial data value and context, database
    records and business transactions. Let’s separate all the metadata
    on two large groups:

    • Technical metadata – The information typically used by storage creators and/or managers.
      Technical metadata can include, for example: information about storage data sources, ETL (Extract
      Transform Load) scripts, storage integrity support procedures, access rights
      information, etc;
    • Analytical metadata – The information which storage analysts use. Analytical metadata can include, for example: typical queries to storage, data and time formats, references to external information sources (WWW-links / Web pages), etc.

    The repository provides information about the location and nature of
    operational data, which is critical in the construction of a data warehouse.
    It acts as a guide to the warehouse data, storing information necessary
    to define the migration environment, mappings of sources to targets, translation
    requirements, business rules, and selection criteria to build the warehouse.

  • Data Marts. There are many different views on what
    Data Marts are used for. A Data Mart can be simply a logically allocated storage part,
    including the information about the data of a definite data domain. But from
    the other side, a Data Mart can be a physically allocated storage part. Thus,
    such Data Mart in the information flow chain can stand «before»
    main storage — as it is an information source, or can stand «behind» it
    — as one of storage data «customer». In the first case, the data is
    first gathered in «small», physically separated storages
    – data marts, at company departments level (for example), then placed in
    «big» enterprise storage. In the second case, main (enterprise)
    storage data marts are built with the purpose of simplifying working with
    the data or for storage load balancing.




See All Articles by Columnist
Alexzander Nepomnjashiy

Alexzander Nepomnjashiy
Alexzander Nepomnjashiy
I am a Microsoft SQL Server Database Designer for Neo-Systems North-West - a security services, consulting, and training company. I have over eight years of experience in the IT field. I am currently working on several projects which involve the deployment of Microsoft Windows NT Server/Microsoft SQL Server within an enterprise business/financial environment. My typical role in these projects includes extending and improving our clients' corporate ERP systems to manage retail sales data, predict market changes and calculate trends for future market situations (DSS, OLAP). Also among my responsibilities are the design and administration of Microsoft SQL Server 7.0/2000 databases. I am available to work on a contract basis for the following types of projects: - Technical authoring, including books, articles, and white papers; - Network and systems design and analysis; - Database and software development and analysis; - Short-term consulting projects. I hope you find these articles useful. If you have any ideas for future articles (in a field of Microsoft SQL Server databases design, administration, performance optimization), or if you have anything to say about the ones below, please do not hesitate to contact me! Feel free to forward these articles to all interested associates. Thank You!

Latest Articles