Storing scanned web-pages and files to SQL Server

February 25, 2004

Learn how to explore any Internet web page and store partial or complete information to either an HTML file or Microsoft SQL server table.

This article provides a guide on how to explore any Internet web page and store partial or complete information to either an HTML file or Microsoft SQL server table. In addition, the method presented here will aid you to scan a webpage for a particular string and then send that string and related content as an email.

Method 1:

This method explains how to read many web pages listed in a text file and store those web pages as HTML files.

Step1:

Create a folder C:\ScanWebPage and list all of the web sites that you would like to scan in a text file as shown below (C:\ScanWebPage\WebSiteList.txt).

http://www.microsoft.com
http://www.oracle.com
http://www.databasejournal.com
http://www.dell.com

Step2:

Copy and paste the code below, and save the file as C:\ScanWebPage\ScanWebPagefromText.vbs

'Author: MAK
'Contact: mak_999@yahoo.com
'Objective: To scan many websites and store it as HTML files
Set iFSO = CreateObject("Scripting.FilesyStemObject")
InputFile="C:\ScanWebPage\WebSiteList.txt"
Set ifile = iFSO.OpenTextFile(inputfile)  
Do until ifile.AtEndOfLine
webpage = ifile.ReadLine
Outputfile=replace(webpage,"http://","")
Outputfile="c:\Scanwebpage\"+Outputfile+".html"
'msgbox webpage
'msgbox outputfile
Set objXMLhttp = CreateObject("MSXML2.XMLhttp") 
With objXMLhttp 
  .open "GET", webpage, False 
	On Error Resume Next 
  .send 
  If Err.Number <> 0 Then 
    Msgbox "XMLhttp error " & Hex(Err.Number) & " " & Err.Description 
  ElseIf .status <> 200 Then 
    MsgBox "http error " & CStr(.status) & " " & .statusText 
  Else 
    Set objFSO = CreateObject("Scripting.FileSystemObject") 
    Set objTS = objFSO.CreateTextFile(outputfile, True) 
    objTS.Write Replace(.responseText, vbLf, vbNewLine) 
    objTS.Close 
    Set objTS = Nothing 
    Set objFSO = Nothing 
  End If 
end with
Set objXMLhttp =nothing
Loop
MsgBox "Completed writing all Web Pages!"

Step 3:

Execute the above VB script to create one HTML file for every website listed in the WebSiteList.txt. Upon completion, you will be prompted with the following message box:

Three files will have been created under c:\scanwebpage:

Method 2

This method explains how to read many web pages listed in a SQL Server table and store those web pages as html files.

Step1:

The following script creates the database and tables for Scanning Web Pages:

Create Database WebPageScanner
go
use WebPageScanner
go
Create Table WebPages(Id int identity(1,1), WebPage varchar(500))
go
Insert into WebPages(WebPage)  select 'http://www.microsoft.com'
Insert into WebPages(WebPage)  select 'http://www.oracle.com'
Insert into WebPages(WebPage)  select 'http://www.databasejournal.com'
Insert into WebPages(WebPage)  select 'http://www.dell.com'
Go
use master
go 
sp_addlogin 'WebPageUser','Web','WebPageScanner'
go
use WebPageScanner
go
sp_adduser 'WebPageUser'
go
sp_addrolemember 'db_datawriter', 'WebPageUser'
go
sp_addrolemember 'db_datareader', 'WebPageUser'
go

Step2:

Copy and paste the code below into a text file and save as C:\ScanWebPage\ScanWebPagefromSQL.vbs

'Author: MAK
'Contact: mak_999@yahoo.com
'Objective: To scan many websites from SQL Table and store it as HTML files
Set AdCn = CreateObject("ADODB.Connection")
AdCn.commandtimeout =36000
Set AdRec1 = CreateObject("ADODB.Recordset")
AdCn.Open = "Provider=SQLOLEDB.1;Data Source=SQL2k\instance1;
Initial Catalog=WebPageScanner;user id = 'WebPageuser';password='Web' "
SQL1 = "Select WebPage from WebPages order by [ID]"
AdRec1.Open SQL1, AdCn,1,1
while not Adrec1.EOF 
WebPage= Adrec1("WebPage")
Outputfile=replace(webpage,"http://","")
Outputfile="c:\Scanwebpage\"+Outputfile+".html"
'msgbox webpage
'msgbox outputfile

Set objXMLhttp = CreateObject("MSXML2.XMLhttp") 
With objXMLhttp 
  .open "GET", webpage, False 
	On Error Resume Next 
  .send 
  If Err.Number <> 0 Then 
    Msgbox "XMLhttp error " & Hex(Err.Number) & " " & Err.Description 
  ElseIf .status <> 200 Then 
    MsgBox "http error " & CStr(.status) & " " & .statusText 
  Else 
    Set objFSO = CreateObject("Scripting.FileSystemObject") 
    Set objTS = objFSO.CreateTextFile(outputfile, True) 
    objTS.Write Replace(.responseText, vbLf, vbNewLine) 
    objTS.Close 
    Set objTS = Nothing 
    Set objFSO = Nothing 
  End If 
end with
Set objXMLhttp =nothing

Adrec1.movenext
Wend

MsgBox "Completed writing all Web Pages!"

Step 3:

Executing the above VB script will create one HTML file for each website listed in the SQL table Webpages. Upon completion, you will be prompted with the following message box:

Three files will have been created under c:\scanwebpage:








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers