The previous article began with the benefits of utilizing Microsoft SQL Server for Internet Information Service (IIS) web log analysis. While there are many third party packages on the market for web log analysis, sometimes you may need highly specific information derived from the logs, in these cases, SQL Server is a very good choice as a data repository and query engine. ODBC logging was investigated as a means of moving IIS log data into SQL Server. Although this method is easy to setup, it has two disadvantages that must be evaluated to your particular web server environment to determine its usefulness. One disadvantage is the added processing overhead ODBC logging creates. The other disadvantage is that the columns ODBC can transport to SQL Server are fixed and cannot be altered. As an example, two of the columns that are missing from the ODBC method include the “referrer” and “cookie” fields. So if the data from these columns are required for your analysis, then ODBC logging cannot be used. This article will begin by examining DTS and Data Import as methods for moving IIS web logs into SQL Server.
For SQL Server 2000 installations, the built in Data Transformation Services Wizard (DTS) provides a quick and easy method for moving log file data into SQL. This example will act on the included sample IIS web log file. To begin, create a new database from the script below to hold the logs:
CREATE DATABASE IISLog;
Rather than using a TSQL script to create the table, we’ll modify the log file to allow DTS to create the table automatically. The IIS log file will always start with four lines of Meta data as shown below. Meta data will be written each time the web server is restarted and when there is new activity.
#Software: Microsoft Internet Information Services 6.0
#Date: 2006-12-01 00:07:48
#Fields: date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port …
The fourth line of Meta data, starting with “#Fields”, list the names of the columns IIS is logging. For DTS to auto create a table with column names, the field names need to be moved to the first line of the file. Delete the first three rows (the lines starting with “Software”, “Version”, and “Date”. Next delete the string “#Fields: “). The first line of the file should now look like the text below:
date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port …
To start the DTS Wizard, open Enterprise Manager and right click the newly created database IISLog. Select “All Tasks,” then “Import Data.”
The Data Transformation Services Import Export Wizard splash screen should appear; click “Next.”
On the Data Source screen, change the source to “Text File,” and browse to the IIS Log file modified previously.
The “Select file format” should now appear. Keep the default of “Delimited” because the IIS log files are space delimited. Change the “Text Qualifier” to none, and check the box “First row has column names.” Click “Next.”
For this next screen, keep the default of “Other,” and put a single space in its text box. The Preview window should populate as show below.
Keep the defaults on the “Destination” screen and click “Next.” The same applies for the next two screens, “Source,” and “Save.” Click Finish and you should receive a Success message.
DTS should have created a new table inside the IISLog database called IISLogFile as shown below. By default, all the columns are varchar(8000).
The “Flat File” Import has changed between SQL 2000 and SQL 2005. One change is that the SQL 2005 Import will default to a column of 50 when trying to bring in IIS files. This may be too small for fields like Referrer. It is a problem easily corrected though. The new SQL 2005 Import routine exposes column settings. A larger problem involves the way an “end of row” is or is not recognized. In SQL 2000, if the import was defined with twenty columns and a short row was encountered (less than twenty columns, like the IIS Meta lines) followed by a CR LF (carriage return line feed), the short data was entered, then the import skipped any remaining columns when the CR LF was encountered and proceeded to the next row. An example of this is shown below. The data appears as we would expect in spite of the Meta rows having less columns than the real data rows.
However, this isn’t the case in SQL 2005. If the import is defined as twenty columns, the delimited data must contain twenty columns. The import will not skip to the next row on encountering an early CR LF. Notice the forth row in the column “cs-uri-stem.” Two rows are running together. The “#Version” should be on a new line as show in the SQL 2000 example above. This behavior will produce import irregularities when trying to mimic SQL 2000 flat file methods on SQL 2005. Eventually, real data rows, as well as Meta rows will be incorrect.
There are two workarounds to this problem. Both involve creating the destination table with TSQL first, then importing the data. One import workaround is to create a SQL 2005 SSIS job and pad (create fake data holders) for any short rows. The other is to remove any Meta from the file prior to import. Microsoft has a free tool for this task called PrepWebLog. It can be found at http://support.microsoft.com/kb/296093/EN-US/ .
Another option for moving logs from IIS to SQL 2000 or SQL 2005 involves using a free tool from Microsoft called Log Parser. This command line tool reads a script or batch file you create. Although the tool requires a little research, it offers an advantage over any of the methods discussed previously. If the columns in your log file change over time (one month you were logging cookies and the next month you weren’t), Log Parser will not require separate script commands for each month. It’s smart enough to look for the columns you want and import them correctly if they exist, or skip them if they don’t. Although this behavior could be incorporated into a DTS or SSIS job, you may find Log Parser much easier to implement correctly. The tool can be found at http://www.microsoft.com/downloads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en . An example batch file script for moving IIS logs to SQL is shown below.
c:\”program files”\”log parser 2.2″\LogParser
SELECT date, time, c-ip, cs-method, sc-bytes, time-taken, cs(user-agent)
TO IISLogsLP” -o:SQL -server:gtweb2 -driver:”SQL Server”
-database:WebLogs -username:sa -password:text -createtable:ON
There are several methods available for moving IIS logs into SQL Server. The method selected will depend on the IIS sever volume and data columns needed. Regardless of the method used to move the data, once there, SQL Server makes a very good choice for detailed log analysis.