Build a Web Site Traffic Analysis Cube: Part I

About the Series …

This is the thirteenth
article of the series, Introduction to MSSQL Server 2000 Analysis
Services
. As I stated in the first article, Creating Our
First Cube
, the primary focus of this series is an introduction to
the practical creation and manipulation of multidimensional OLAP cubes. The
series is designed to provide hands-on application of the fundamentals of MS
SQL Server 2000 Analysis Services ("Analysis Services"), with
each installment progressively adding features and techniques designed to meet
specific real-world needs. For more information on the series, as well as the hardware
/ software requirements to prepare for the exercises
we will undertake,
please see my initial article, Creating Our
First Cube
.

Preparation

Prior
to beginning the lesson, you will need to download a copy of the sample Server
Access Log, ServAccessLog.txt, a zipped text
file that we will use as a data source in Part I of this lesson. Once
the log is downloaded, unzip it and place it in a location that you can easily
remember later, when we select the file as a data source. Once the lesson is
completed, the file can be discarded to conserve hard disk space, if desired.

Introduction

While the majority of
our series to date has focused upon the design and creation of cubes within
Analysis Services (see Articles One through Nine of the Introduction to MSSQL Server 2000
Analysis Services
series), we began in Article
Ten
to discuss reporting
options for our cubes. My intention with Articles Ten, Eleven, and Twelve was to offer a response to the expressed need of
several readers for options in this regard – options beyond the mere browse
capabilities within Analysis Services.

In Articles Ten and Eleven, we explored some of
the options offered by Microsoft Office – specifically the Excel PivotTable
Report
and Office PivotTable List, respectively – for report
building with Analysis Services cubes. In Article Twelve, we explored features
that integrate Analysis Services and Cognos PowerPlay, to provide a
vehicle for client reporting and other business intelligence pursuits. The focus
of the article was a basic overview of the steps involved in a simple
(non-integrated security) connection of Cognos PowerPlay to a Microsoft
Analysis Services cube, and then a high level overview of the use of PowerPlay
for Windows
and PowerPlay Web for the performance of analysis and
reporting upon the Analysis Services OLAP data source.

In this
article we will return to the hands-on design and building of cubes for various
business purposes. Specifically, the next two articles will focus on the design
and construction of a Web Site Traffic Analysis Cube. In Part I,
after a brief discussion of potential business reasons for collecting web site
traffic data, we will design and build an extract procedure, to illustrate one
approach for entraining statistical data for ultimate placement into our new
traffic analysis cube. Next, we will set up a simple data source that will
serve as the destination point for the extract process, and as a basis for the
design and creation of a web traffic analysis cube in Part II. Finally,
we will browse our cube using the Analysis Services browser to examine the
results of our handiwork.

The topics within Part
I
of this two-part article will include:

  • An overview of the business
    needs behind the desire to report upon web site traffic statistics;

  • An overview of the Server
    Access Log, and a discussion of its use as a source of web site activity
    tracking data;

  • A practical demonstration of the extraction of sample
    traffic statistics raw data from a log file, and it’s importation into a
    database using MS SQL Server 2000 Data Transformation Services ("DTS");

  • Creation and population of a table
    in MSSQL Server 2000 to support our site traffic analysis cube in Part II.

Why a Site Traffic Analysis Cube?

In this lesson, we will return to an
examination of real-life applications that can leverage the power of Analysis
Services. The scenario that we explore in this article will surround the
business need of a web site owner to analyze traffic.

The uses for site traffic analysis and
statistics are legion, and the degree and complexity of the analysis performed
can range widely. Examples might include the need to establish baseline
activity on a given site before implementing a promotional campaign within the
organization, as a means of determining the effectiveness of that campaign from
various perspectives. Current traffic metrics can be useful for a number of
other reasons as well. They can show us which overall resources or site
features are attracting visitors, which pages in the site are being skipped by
visitors (or, worse, simply not being seen due to obscurity in naming and
referencing, non-intuitive links, and so forth), who our visitors are, and from
what site they were referred to ours, among many other potentially valuable
bits of information.

A partial list of "typical"
web site tracking reports that I have put in place for clients in the past
includes the following. The titles of the reports are shown here to give an
indication of possible dimensions upon which one might seek to report. Other,
more advanced reporting perspectives are, of course, possible.

Summary Reports

  • Totals and Averages (various reports)

Basic Tracking
Reports

  • Unique Visitors, by

    • Days

    • Weeks

    • Months

    • Days of the Week

    • Hours of the day
  • Reloads by:

    • Days

    • Weeks

    • Months

  • Geographical Tracking by:

    • Domains

    • Countries (with obvious regional,
      province, state, etc., hierarchical levels)

    • Continents
  • System Tracking by:

    • Browsers

    • JavaScript Enabled

    • Operating Systems

    • Screen Resolutions

    • Screen Colors
  • Referrer Tracking
    by:

    • Last 20 (number varies …)

    • Last 20 from Email

    • Last 20 from Search Engines

    • Last 20 Queries

    • Last 20 from Usenet

    • Last 20 from Hard Disk
  • Referrer Tracking
    by:

    • Totals by Source:

      • Website

      • Search Engine

      • Email

      • Usenet

      • Hard Disk
    • Totals by Search Engine:

      • 24 most popular engines (number
        varies)
    • All Keywords

    • All Website Referrers

There are many other
potential dimensions, but perhaps this gives a flavor for the possibilities.
Along with informing us of which resources on our site hold the attention of
our visitors, web statistics can expose, both directly and by inference, many
of the characteristics of the visitors, along with various attributes of their
visits to our sites. These characteristics and attributes might include the
following examples:

  • Duration of visits to the site
    (and individual pages thereof);

  • Most popular times of day /
    days of week for visits;

  • Likelihood of actual reading of
    resources, or mere skimming / skipping about;

  • Optimal times to perform
    maintenance / updates, based upon traffic valleys;

  • Characteristics of the people
    drawn to the site (demographics, etc.);

  • Characteristics of people likely
    to visit with adequate promotion;

  • Navigational impediments / perceived
    difficulties that shorten visits / prevent returns;

  • Participation in, percentage of
    completion of, and resistance to surveys and other information gathering
    vehicles.
William Pearson
William Pearson
Bill has been working with computers since before becoming a "big eight" CPA, after which he carried his growing information systems knowledge into management accounting, internal auditing, and various capacities of controllership. Bill entered the world of databases and financial systems when he became a consultant for CODA-Financials, a U.K. - based software company that hired only CPA's as application consultants to implement and maintain its integrated financial database - one of the most conceptually powerful, even in his current assessment, to have emerged. At CODA Bill deployed financial databases and business intelligence systems for many global clients. Working with SQL Server, Oracle, Sybase and Informix, and focusing on MSSQL Server, Bill created Island Technologies Inc. in 1997, and has developed a large and diverse customer base over the years since. Bill's background as a CPA, Internal Auditor and Management Accountant enable him to provide value to clients as a liaison between Accounting / Finance and Information Services. Moreover, as a Certified Information Technology Professional (CITP) - a Certified Public Accountant recognized for his or her unique ability to provide business insight by leveraging knowledge of information relationships and supporting technologies - Bill offers his clients the CPA's perspective and ability to understand the complicated business implications and risks associated with technology. From this perspective, he helps them to effectively manage information while ensuring the data's reliability, security, accessibility and relevance. Bill has implemented enterprise business intelligence systems over the years for many Fortune 500 companies, focusing his practice (since the advent of MSSQL Server 2000) upon the integrated Microsoft business intelligence solution. He leverages his years of experience with other enterprise OLAP and reporting applications (Cognos, Business Objects, Crystal, and others) in regular conversions of these once-dominant applications to the Microsoft BI stack. Bill believes it is easier to teach technical skills to people with non-technical training than vice-versa, and he constantly seeks ways to graft new technology into the Accounting and Finance arenas. Bill was awarded Microsoft SQL Server MVP in 2009. Hobbies include advanced literature studies and occasional lectures, with recent concentration upon the works of William Faulkner, Henry James, Marcel Proust, James Joyce, Honoré de Balzac, and Charles Dickens. Other long-time interests have included the exploration of generative music sourced from database architecture.

Latest Articles