A data mart provides the primary access to the data stored in the data warehouse or operational data store. It is a subset of data sourced from the data warehouse or operational data store specifically focused on a business function or set of related business functions. Read on to learn the answers to fundamental questions about data marts.
We did it! We built the data warehouse,
we created the ETL framework, we identified the data sources, we even created
the metadata repository and the ODS; so now we have received a request to build
a data mart. The
business users want to be able to access data in this great data warehouse.
They spent a significant amount of time, resources and patience to build a data
warehouse environment that would stand the test of time and they want to be
able to get their money's worth.
So how do we start? Better yet, where do we start? I think a
good place to start is to understand what a data mart is. So what is a data
mart? Well, through the business
intelligence framework, a data mart provides the primary access to the data
stored in the data warehouse or operational data store. It is a subset of data
sourced from the data warehouse or operational data store specifically focused
on a business function or set of related business functions. The rest of this
article will provide answers to fundamental questions about data marts.
What is the purpose of a data mart?
The fact is that the data warehouse is a repository of centralized,
cleansed, standardized, transformed and integrated historical data. It is the
organizational view of its business from a data perspective. However, it is an
extremely difficult task to access or report on data within a data warehouse.
The data in a data warehouse is organized by subject area; it's historical and
represents the organization as a whole. In the same way the operational data
store (ODS) has the characteristics of the data warehouse with the most
important exception that the data in an ODS is NON-HISTORICAL. Think of what
that set of SQL statements would look like to accomplish this task! Lots of
complex joins! Performance degradation related to the volume of data that must
be read to satisfy the criteria in the SQL statements! Phew.
That's where the data mart serves a purpose. It provides the business unit
with easy access to data in the warehouse or the operational data store. Why?
Because the data in the data mart is specifically tailored to the business
unit. It is organized based on specific business requirements. The primary
purpose for its existence is to support data dependent business functions
related to reporting, trending, analysis and presentation.
Are there types of data marts?
Data marts are created to support the needs and requirements of the business
unit. These requirements are the driving force behind the type of data mart
built. So let's take a look at some of the common and fundamental business
requirements and the type of data mart that works well with the solution.
The most common of data mart types is the star schema data mart. A star
schema is a fact table and a set of dimension tables and the relationships
between the two sets. What is a fact table? A fact table is a set of metrics or
measures generated that relates to a specific business process or event. An
example of a fact table is "Claims Paid Large Amount for Marine
Craft". This fact table is exactly what it says; it contains claims paid
in certain amounts for marine policies on marine craft. A dimension table is a
set of elements that describe the event of metrics in the fact table. A
dimension table contains the attributes that places the metrics in the fact
table into a particular context. An example of a dimension table is Accident
State. Star schema data marts are simple and very cost effective to deploy and
provide multidimensional functionality within a relational database. Its design
lends itself best to trend and historical reporting and is supported by most
DBMSs including SQL Server, UDB and Oracle.
A snowflake schema data mart is a star schema with dimension tables that
have been normalized. This type of data mart works best in situations where
dimension tables are too large (most of the columns in the dimension table are
populated with data values) or too sparse (few data values in columns of the
dimension tables. A snowflake schema will have the fact table; however, the
dimension tables may have parent tables that they are related to. This design
enables the efficient use of storage, as there is a significant reduction in
data redundancy. However, maintenance is more complicated as there are more
tables to populate and synchronize. A snowflake schema works well where the
dimensions are not clearly defined.
A data or OLAP cube is a very interesting solution. It is a multidimensional
database that is specifically architected to support all flavors of reporting
including analytics, trending and historical reporting. It is organized into
measures (facts) and dimensions. However, this type of database is optimized to
report on data from different views and perspectives. It provides the ability
to slice and dice, drill through to atomic detail and discover patterns in
data. A data or OLAP cube is best suited for solutions where efficient
reporting with complex business requirements is mandatory. It works best with
sophisticated business intelligence tools that possess features and
functionality that enables reporting from cubes. Software solutions such as
Business Objects, Cognos, Hyperion and SQL Server Analysis Services (SSAS) are
perfect examples here. However, the design, maintenance and processing of the
cube should not be under estimated. The cube build itself is quite resource
intensive, therefore requires robust hardware and software designed for this
type of workload. Trust me on this one. I have worked on reporting applications
where the cube build process would fail at least once a week! Why? For a couple
of reasons, the first being the design of the some of the measures and the
capacity of the server, including limited memory and processing power.
Are there Design Rules for Data Marts?
In terms of design rules for an OLAP cube, a set of fundamentals hold true.
Here are a few that I would like to share.
Based on the business requirements, the team, through careful analysis, must
identify the sources that will be used to build the cube. Ideally the subject
area from a data warehouse or ODS should be sufficient as it contains the
entire organization's data assets.
Design the dimensions carefully and appropriately. They are an extremely
important part of the data mart solution as they provide the context for the
metrics in the fact table. Use conformed dimensions wherever possible. A
conformed dimension is an entity where its definition is shared among business
units. An example of this is geographic region dimension. There is only one set
of states in the United States of America; this should be a common or shared
definition of geography.
Lastly, remember, "Just say no to nulls". A data mart is a subset
of data from a data warehouse. This data is a recording of events that occurred
from sets of business transactions. These transactions all have valid data
values that are something other than nulls, zeroes or spaces. So therefore, the
data mart should be a reflection of that.
One Last Thing
A data mart is the mechanism through which the business community can access
and view the data that has been transformed and enriched from a number of
transactional systems across the enterprise into a holistic view of the
organization's data assets. Care must be taken in designing an appropriate
solution that aligns to business requirements and the strategic direction of
the organization at large.
See All Articles by Columnist