Storing scanned web-pages and files to SQL Server


Learn how to explore any
Internet web page and store partial or complete information to either an HTML
file or Microsoft SQL server table.

This article provides a guide on how to explore any
Internet web page and store partial or complete information to either an HTML
file or Microsoft SQL server table. In addition, the method presented here will
aid you to scan a webpage for a particular string and then send that string and
related content as an email.

Method 1:

This method explains how to read
many web pages listed in a text file and store those web pages as HTML files.

Step1:

Create a folder C:\ScanWebPage and list all of the web
sites that you would like to scan in a text file as shown below
(C:\ScanWebPage\WebSiteList.txt).

http://www.microsoft.com

http://www.oracle.com
http://www.databasejournal.com
http://www.dell.com

Step2:

Copy and paste the code below, and save the file as
C:\ScanWebPage\ScanWebPagefromText.vbs


‘Author: MAK
‘Contact: [email protected]
‘Objective: To scan many websites and store it as HTML files
Set iFSO = CreateObject(“Scripting.FilesyStemObject”)
InputFile=”C:\ScanWebPage\WebSiteList.txt”
Set ifile = iFSO.OpenTextFile(inputfile)
Do until ifile.AtEndOfLine
webpage = ifile.ReadLine
Outputfile=replace(webpage,”http://”,””)
Outputfile=”c:\Scanwebpage\”+Outputfile+”.html”
‘msgbox webpage
‘msgbox outputfile
Set objXMLhttp = CreateObject(“MSXML2.XMLhttp”)
With objXMLhttp
.open “GET”, webpage, False
On Error Resume Next
.send
If Err.Number <> 0 Then
Msgbox “XMLhttp error ” & Hex(Err.Number) & ” ” & Err.Description
ElseIf .status <> 200 Then
MsgBox “http error ” & CStr(.status) & ” ” & .statusText
Else
Set objFSO = CreateObject(“Scripting.FileSystemObject”)
Set objTS = objFSO.CreateTextFile(outputfile, True)
objTS.Write Replace(.responseText, vbLf, vbNewLine)
objTS.Close
Set objTS = Nothing
Set objFSO = Nothing
End If
end with
Set objXMLhttp =nothing
Loop
MsgBox “Completed writing all Web Pages!”

Step 3:

Execute the above
VB script to create one HTML file for every website listed in the
WebSiteList.txt. Upon completion, you will be prompted with the following
message box:

Three files will have been created under c:\scanwebpage:

Method 2

This method explains how to read many web pages listed in
a SQL Server table and store those web pages as html files.

Step1:

The following script creates the database and tables for
Scanning Web Pages:


Create Database WebPageScanner
go
use WebPageScanner
go
Create Table WebPages(Id int identity(1,1), WebPage varchar(500))
go
Insert into WebPages(WebPage) select ‘http://www.microsoft.com’
Insert into WebPages(WebPage) select ‘http://www.oracle.com’
Insert into WebPages(WebPage) select ‘http://www.databasejournal.com’
Insert into WebPages(WebPage) select ‘http://www.dell.com’
Go
use master
go
sp_addlogin ‘WebPageUser’,’Web’,’WebPageScanner’
go
use WebPageScanner
go
sp_adduser ‘WebPageUser’
go
sp_addrolemember ‘db_datawriter’, ‘WebPageUser’
go
sp_addrolemember ‘db_datareader’, ‘WebPageUser’
go

Step2:

Copy and paste the code below into a text file and save as
C:\ScanWebPage\ScanWebPagefromSQL.vbs


‘Author: MAK
‘Contact: [email protected]
‘Objective: To scan many websites from SQL Table and store it as HTML files
Set AdCn = CreateObject(“ADODB.Connection”)
AdCn.commandtimeout =36000
Set AdRec1 = CreateObject(“ADODB.Recordset”)
AdCn.Open = “Provider=SQLOLEDB.1;Data Source=SQL2k\instance1;
Initial Catalog=WebPageScanner;user id = ‘WebPageuser’;password=’Web’ ”
SQL1 = “Select WebPage from WebPages order by [ID]”
AdRec1.Open SQL1, AdCn,1,1
while not Adrec1.EOF
WebPage= Adrec1(“WebPage”)
Outputfile=replace(webpage,”http://”,””)
Outputfile=”c:\Scanwebpage\”+Outputfile+”.html”
‘msgbox webpage
‘msgbox outputfile

Set objXMLhttp = CreateObject(“MSXML2.XMLhttp”)
With objXMLhttp
.open “GET”, webpage, False
On Error Resume Next
.send
If Err.Number <> 0 Then
Msgbox “XMLhttp error ” & Hex(Err.Number) & ” ” & Err.Description
ElseIf .status <> 200 Then
MsgBox “http error ” & CStr(.status) & ” ” & .statusText
Else
Set objFSO = CreateObject(“Scripting.FileSystemObject”)
Set objTS = objFSO.CreateTextFile(outputfile, True)
objTS.Write Replace(.responseText, vbLf, vbNewLine)
objTS.Close
Set objTS = Nothing
Set objFSO = Nothing
End If
end with
Set objXMLhttp =nothing

Adrec1.movenext
Wend

MsgBox “Completed writing all Web Pages!”

Step 3:

Executing the
above VB script will create one HTML file for each website listed in the SQL
table Webpages. Upon completion, you will be prompted with the following
message box:

Three files will have been created under c:\scanwebpage:

Latest Articles