OLAP and Data Warehousing - Data Warehousing Solution Architecture (Part 1)

July 24, 2002


Data Warehouse is not a software product but is instead the architecture used for enterprise-level business intelligence information systems organization. Data Warehouse storage (database) is a central part of such systems; however, a full-function Data Warehouse solution consists of complementary products and technologies. The minimal typical components of a decision support system are:

  • Storage database. Usually organized on an DBMS platform. As the sizes of storage can reach hundreds gigabyte, the DBMS should support the VLDB (Very Large Databases) technology.

  • Loading data tools. Tools for data acquisition, clearing, aggregation, transformation, unifying and synchronization. This class of tools is responsible for transforming data stored in OLTP-systems into integrated and interdependent information as well as saving this information in the storage database.

Data acquisition is the task of bringing data from everywhere to the data warehouse. Most businesses have several operational systems that handle the organization's day-to-day processing. These systems serve as the data source for the warehouse. The systems might reside on a mainframe, in a client/server database application, in a third-party application with a proprietary data store, within desktop applications such as spreadsheets and database applications, or any combination of these. The challenge is to identify the data sources and develop a solution for extracting and delivering the data to the warehouse in a timely, scheduled manner.

After collecting the data, you need to transform it. In an ideal organization, all systems would use the same set of codes and definitions for all data elements. Data transformation is the cleansing and validation of data for accuracy and ensuring that all values conform to a standard definition. After these data transformation tasks are complete, you can add the data to the warehouse

ETL tools should be reliable and efficient. They should also have known flexibility, submitting the possibility of configuration and programming with usage of transformation script language. One more essential requirement to these tools -- the possibility to access a broad range of data sources usually provided by the general purpose data access interfaces such as ODBC, OLE DB, JDBC, etc. The industrial software giants only recently have begun to release products possessing the described functionality; therefore, for many years it was necessary to use proprietary solutions oriented at the concrete system, or customized products from small companies such as Prism Solutions, ETI, Carleton and similar. Products released recently from large corporations include Visual Warehouse (from IBM) and Data Transformation Services (from Microsoft).

  • Repository (metadata storage). Metadata - «the data about the data», which describe the information introduced in storage. This information is indispensable both for maintaining storage integrity and for storing information about storage itself. Another definition for metadata term -- «information that describes database informational filling». For example, metadata informs users when storage data was last refreshed, its (data) format and for what is supposed to be applied. Also, this information helps analysts, working with the storage, to understand financial data value and context, database records and business transactions. Let's separate all the metadata on two large groups:

    • Technical metadata - The information typically used by storage creators and/or managers. Technical metadata can include, for example: information about storage data sources, ETL (Extract Transform Load) scripts, storage integrity support procedures, access rights information, etc;

    • Analytical metadata - The information which storage analysts use. Analytical metadata can include, for example: typical queries to storage, data and time formats, references to external information sources (WWW-links / Web pages), etc.

    The repository provides information about the location and nature of operational data, which is critical in the construction of a data warehouse. It acts as a guide to the warehouse data, storing information necessary to define the migration environment, mappings of sources to targets, translation requirements, business rules, and selection criteria to build the warehouse.


  • Data Marts. There are many different views on what Data Marts are used for. A Data Mart can be simply a logically allocated storage part, including the information about the data of a definite data domain. But from the other side, a Data Mart can be a physically allocated storage part. Thus, such Data Mart in the information flow chain can stand «before» main storage -- as it is an information source, or can stand «behind» it -- as one of storage data «customer». In the first case, the data is first gathered in «small», physically separated storages - data marts, at company departments level (for example), then placed in «big» enterprise storage. In the second case, main (enterprise) storage data marts are built with the purpose of simplifying working with the data or for storage load balancing.


See All Articles by Columnist Alexzander Nepomnjashiy









The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers