Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted Jul 21, 2003

Build a Web Site Traffic Analysis Cube: Part I - Page 2

By William Pearson

Obtaining the Traffic Data

Before we can begin the generation of our Site Traffic Analysis Cube in Analysis Services, we need a data source that has, in essence, the characteristics of a star schema, or at least those of a fact table. A common source of information about traffic and activity on virtually any web site is the Server Access Log, which we will discuss in the next section.

Introducing the Server Access Log

While there are many options for storing site access information, a common measure of these statistics, file accesses (pervasively known as "hits"), are typically recorded in a file called an access log. Although it is beyond the scope of this lesson to become involved in the well-known considerations surrounding the appropriateness of the use of "hits" as the sole measure of site traffic, many of us are aware of the fact that alternatives exist that might more accurately provide refined data as to actual visits versus mere file accesses, etc. For the purposes of this article, we will use the access log as the sole source of data with which we intend to build our cube, primarily to keep focused on the cube design and construction itself. Suffice it to say that many advanced approaches to the objective of articulate data capture have been devised, and that our approach within the scope of this lesson certainly is not a recommendation of "best practices" within this specialized science.

The Server Access Log is central, for our purposes, to learning about the visits to our site, as well as the visitors themselves. At a high level, any visit to our site means a corresponding request for a file from the site. A request for a file results in a corresponding entry to the server access log, which acts as a cumulative history of every attempt (whether successful or unsuccessful) to retrieve information from the site. The information from the individual entries is easily extracted and loaded to a data store that is more readily adapted to the support of a multidimensional cube. The typical log contains entries in the common log file format that includes the following fields:

  • Host
  • Identification
  • User Authentication
  • Time Stamp
  • HTTP Request Type
  • Status Code
  • Transfer Volume

Example entries in a simple server access log appear in Illustration 1.


Illustration 1: Select Entries in an Example Server Access Log

Other logs exist, in addition to the above, and, often, logs are combined to add additional information to the entries shown above. For purposes of our lesson, we will work with a sample log file similar to that shown in Illustration 1. Keep in mind, throughout the procedures that follow, that the steps we take to entrain the data into our star schema are similar with any log, with modifications obviously required to adapt to differing scenarios. Again, we will keep the extraction process simple so that we can focus on our primary objectives.

As virtually all of us know, many options exist for performing the import of a log file. We not only want to import information from the file, however, but we want to set that information up into a data source that can be easily accessed by Analysis Services to build a cube. To achieve both objectives simultaneously, we will use Data Transformation Services (DTS), an Extraction, Transformation and Loading (ETL) tool that accompanies Microsoft SQL Server 2000.

Populating a Cube Data Source using DTS

Data Transformation Services (DTS), which comes along with the typical installation of MSSQL Server 2000, acts as an excellent tool for developing, automating, and managing data extraction, transformation and loading. In this lesson, our first major step will be to entrain the Server Access Log data, which we described in the previous section, that is useful to our cube design. While our excursion into the use of DTS will involve only the simplest functions, we will still gain an appreciation for the flexibility of the tool, and the number of things we can accomplish from a single graphical user interface. In our example, we will use DTS to accomplish the lion's share of building the data source structure, including the creation of the destination database and its member table; the process would likely differ in reality, particularly in the fact that the star schema would most likely have been designed and in place well before beginning the ETL process.

DTS makes use of OLE DB to acquire data. While the details lie outside our present scope, it is useful to understand that OLE DB goes beyond ODBC's limitations of access to only relational data. OLE DB defines a set of COM interfaces that let you access and manipulate any data type, enabling Microsoft's much-touted Universal Data Access (UDA).

In its role as an OLE DB consumer, DTS extracts data from any data source that acts as an OLE DB provider (that is, provides a native OLE DB interface), as well as from any ODBC data source. As we will see, we establish an OLE DB connection, using the DTS Package Designer, to our data source (the Server Access Log) in order to extract the data we need and to load it to our MSSQL Server destination database. Acting in its transformation role, DTS will map the access log data fields to the respective destination data fields via the data source connections, enabling the conversion of data types as necessary.



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM