Build a Web Site Traffic Analysis Cube: Part I

July 21, 2003

About the Series ...

This is the thirteenth article of the series, Introduction to MSSQL Server 2000 Analysis Services. As I stated in the first article, Creating Our First Cube, the primary focus of this series is an introduction to the practical creation and manipulation of multidimensional OLAP cubes. The series is designed to provide hands-on application of the fundamentals of MS SQL Server 2000 Analysis Services ("Analysis Services"), with each installment progressively adding features and techniques designed to meet specific real-world needs. For more information on the series, as well as the hardware / software requirements to prepare for the exercises we will undertake, please see my initial article, Creating Our First Cube.

Preparation

Prior to beginning the lesson, you will need to download a copy of the sample Server Access Log, ServAccessLog.txt, a zipped text file that we will use as a data source in Part I of this lesson. Once the log is downloaded, unzip it and place it in a location that you can easily remember later, when we select the file as a data source. Once the lesson is completed, the file can be discarded to conserve hard disk space, if desired.

Introduction

While the majority of our series to date has focused upon the design and creation of cubes within Analysis Services (see Articles One through Nine of the Introduction to MSSQL Server 2000 Analysis Services series), we began in Article Ten to discuss reporting options for our cubes. My intention with Articles Ten, Eleven, and Twelve was to offer a response to the expressed need of several readers for options in this regard - options beyond the mere browse capabilities within Analysis Services.

In Articles Ten and Eleven, we explored some of the options offered by Microsoft Office - specifically the Excel PivotTable Report and Office PivotTable List, respectively - for report building with Analysis Services cubes. In Article Twelve, we explored features that integrate Analysis Services and Cognos PowerPlay, to provide a vehicle for client reporting and other business intelligence pursuits. The focus of the article was a basic overview of the steps involved in a simple (non-integrated security) connection of Cognos PowerPlay to a Microsoft Analysis Services cube, and then a high level overview of the use of PowerPlay for Windows and PowerPlay Web for the performance of analysis and reporting upon the Analysis Services OLAP data source.

In this article we will return to the hands-on design and building of cubes for various business purposes. Specifically, the next two articles will focus on the design and construction of a Web Site Traffic Analysis Cube. In Part I, after a brief discussion of potential business reasons for collecting web site traffic data, we will design and build an extract procedure, to illustrate one approach for entraining statistical data for ultimate placement into our new traffic analysis cube. Next, we will set up a simple data source that will serve as the destination point for the extract process, and as a basis for the design and creation of a web traffic analysis cube in Part II. Finally, we will browse our cube using the Analysis Services browser to examine the results of our handiwork.

The topics within Part I of this two-part article will include:

  • An overview of the business needs behind the desire to report upon web site traffic statistics;
  • An overview of the Server Access Log, and a discussion of its use as a source of web site activity tracking data;
  • A practical demonstration of the extraction of sample traffic statistics raw data from a log file, and it's importation into a database using MS SQL Server 2000 Data Transformation Services ("DTS");
  • Creation and population of a table in MSSQL Server 2000 to support our site traffic analysis cube in Part II.

Why a Site Traffic Analysis Cube?

In this lesson, we will return to an examination of real-life applications that can leverage the power of Analysis Services. The scenario that we explore in this article will surround the business need of a web site owner to analyze traffic.

The uses for site traffic analysis and statistics are legion, and the degree and complexity of the analysis performed can range widely. Examples might include the need to establish baseline activity on a given site before implementing a promotional campaign within the organization, as a means of determining the effectiveness of that campaign from various perspectives. Current traffic metrics can be useful for a number of other reasons as well. They can show us which overall resources or site features are attracting visitors, which pages in the site are being skipped by visitors (or, worse, simply not being seen due to obscurity in naming and referencing, non-intuitive links, and so forth), who our visitors are, and from what site they were referred to ours, among many other potentially valuable bits of information.

A partial list of "typical" web site tracking reports that I have put in place for clients in the past includes the following. The titles of the reports are shown here to give an indication of possible dimensions upon which one might seek to report. Other, more advanced reporting perspectives are, of course, possible.

Summary Reports

  • Totals and Averages (various reports)

Basic Tracking Reports

  • Unique Visitors, by
    • Days
    • Weeks
    • Months
    • Days of the Week
    • Hours of the day
  • Reloads by:
    • Days
    • Weeks
    • Months
  • Geographical Tracking by:
    • Domains
    • Countries (with obvious regional, province, state, etc., hierarchical levels)
    • Continents
  • System Tracking by:
    • Browsers
    • JavaScript Enabled
    • Operating Systems
    • Screen Resolutions
    • Screen Colors
  • Referrer Tracking by:
    • Last 20 (number varies ...)
    • Last 20 from Email
    • Last 20 from Search Engines
    • Last 20 Queries
    • Last 20 from Usenet
    • Last 20 from Hard Disk
  • Referrer Tracking by:
    • Totals by Source:
      • Website
      • Search Engine
      • Email
      • Usenet
      • Hard Disk
    • Totals by Search Engine:
      • 24 most popular engines (number varies)
    • All Keywords
    • All Website Referrers

There are many other potential dimensions, but perhaps this gives a flavor for the possibilities. Along with informing us of which resources on our site hold the attention of our visitors, web statistics can expose, both directly and by inference, many of the characteristics of the visitors, along with various attributes of their visits to our sites. These characteristics and attributes might include the following examples:

  • Duration of visits to the site (and individual pages thereof);
  • Most popular times of day / days of week for visits;
  • Likelihood of actual reading of resources, or mere skimming / skipping about;
  • Optimal times to perform maintenance / updates, based upon traffic valleys;
  • Characteristics of the people drawn to the site (demographics, etc.);
  • Characteristics of people likely to visit with adequate promotion;
  • Navigational impediments / perceived difficulties that shorten visits / prevent returns;
  • Participation in, percentage of completion of, and resistance to surveys and other information gathering vehicles.







The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers