Introduction
Turning data into information and information into knowledge for the purposes of insight and fact-based decision support is the purpose of business intelligence. When delving into business intelligence for the first time, there are some important terms to understand. Deanna Dicken walks through some of the basic terminology needed to get you started.
BIDS
BIDS
is short for Business Intelligence Development Studio. This Integrated
Development Environment (IDE) is Visual Studio 2008 with some additional
project types for building SQL Server Integration Services (SSIS), SQL Server
Analysis Services (SSAS), and SQL Server Reporting Services (SSRS) solutions.
Cube
At the heart of multidimensional databases (see OLAP below) are cubes.
Cubes represent the many perspectives (or dimensions) with which you may want
to view the data. Cubes contain levels, measures, dimensions, and hierarchies.
Data Mart
A data
mart is typically a smaller, more focused version of a data warehouse (see
below). Data marts bring together data for a single business unit or user group
for the purposes of analysis.
For instance, you might create a data mart to analyze the operational
processing times by step and product for a single business unit. This might
help management glean knowledge of long running steps or maybe justify a new
piece of equipment.
Data Mining
Data
mining is a process by which algorithms and complex searches are applied to
data in an attempt to discover patterns and relationships within the data.
Data Warehouse
When data from one or more data sources are brought together in a central
store related to a common subject for the purpose of analysis, this is a data
warehousing. The data store is called a data warehouse.
For instance, you might collect sales and product data from all lines of
business within the company into a data warehouse to provide for analyzing
product sales over time, geographic location and product or product category.
Dimension
Dimensions
can be thought of as perspectives on a subject. Let’s say your subject of
interest is sales. One way of analyzing sales is by time. How many widgets did
we sell last month? Last quarter? Last year? This is a time dimension.
Maybe you want to look at your sales data by geography. What were our sales
in Beijing? China? How about all of Asia? This is a geography dimension.
Product or product category might be useful. What was my best selling
product? Are there any product categories that aren’t hitting their sales
targets? This is a product dimension.
Having multiple perspectives or dimensions with which to view the subject is
very powerful. This allows you to put them together and get answers to even
more detailed questions. What was the worst selling product category in Beijing
last quarter?
ETL
Short for Extract-Transform-and Load, ETL
is the means by which data is pulled from a data source (extracted), scrubbed
(transformed), and written (loaded) into the destination database.
Fact
A fact
table in an OLAP database is essentially the subject of the analysis. The
dimensions can be thought of as foreign keys to this table or attributes of the
fact. For instance, as described above, you might have sales as your fact table
or subject. Attributes of a sale would be things like date and time, customer,
product, geographical location, sales clerk, etc. All these characteristics of
the sale could be your dimension tables.
Hierarchy
Data within a dimension can often be represented through one or more
hierarchies. These are parent-child
relationships within a dimension. For instance, in a time dimension, a year
is a parent to quarter, a quarter is a parent to month, and so on. As you see
here the parent-child relationships can nest. These hierarchies then make it
possible to drill up or down within a dimension.
KPI
A KPI
is a Key Performance Indicator. These indicators are measures that are
important to the business, typically relating to the overall health of its
operation. An example would be profit margin. Profit margin can be trended over
time. A significant deviation (for the worst) from expected values can alert
management to drill into those measures of profit margin to see which might be
causing the issue, allowing for a quick response.
Level
Levels are an implementation of the hierarchies that represent the level of
detail viewed in a dimension. In the example above for a time dimension’s
hierarchies, we had levels for year, quarter, and month.
Measure
Numerical data stored on a fact table is a measure. For instance, regarding
sales fact, we might have a measure for cost, gross profit, maybe even order
fulfillment time. These measures can then be summarized for each level of
analysis.
MOLAP
When data is pre-calculated and stored in a cube, this is referred to as MOLAP
or Multi-dimensional Online Analytical Processing.
OLAP
Online Analytical Processing is a strategy for summarizing and analyzing
large amounts of data using a multi-dimensional data model. Implementations of OLAP
include MOLAP and ROLAP.
ROLAP
Unlike MOLAP, ROLAP
(Relational Online Analytical Processing) does not require the data to be
pre-summarized or stored in a multi-dimensional store. Instead, ROLAP allows
for analysis and on-the-fly computations to occur against the relational
database.
Snowflake Schema
A snowflake schema refers to the arrangement of tables in a
multi-dimensional data model where dimensions hanging off the fact table also
have tables hanging off of them. The links in the data model start to resemble
a snowflake with multiple branches hanging off the dimension tables and not the
fact table. This occurs with normalized models in a multi-dimensional data
store.
Star Schema
With a star schema for multidimensional modeling, the fact table sits in the
middle of the diagram and all the dimensions hang directly off of the fact
table thus resembling a star. The data in the dimension tables is denormalized
to allow all other tables to link directly to the fact table.
SQL Server Analysis Services
Analysis
Services is an enterprise level product that ships with SQL Server as a
tool to help businesses gain insight into the organization’s data and to
perform predictive analysis.
Conclusion
I hope that after reading this article you have enough of an idea of basic
business intelligence terminology to understand what it takes to design and
build your first business intelligence solution.
For More Information
Analysis Services – SQL Server 2008 R2