|
||||||
Using VBScript to Extract Data from a Web PageHow to Obtain Important Information from the Internet
VBScript is Visual Basic's powerful scripting language and can be used, for example, to obtain the contents of a web page and then search it for any important information
Web pages on the Internet are normally designed for humans to view. They're not normally designed with the aim of allowing programs to extract information from them. However, a computer programmer can process web pages automatically by using languages such as Microsoft Visual Basic's scripting language - VBScript. This is made possible by the fact that VBScript can:
This means, of course, that the programmer can search for a key piece of information from a web page (perhaps one that's updated on a regular basis), and then use that information in their own application. Using VBScript to Obtain the Contents of a Web PageThere is not a VBScript method for reading the contents of a web stage. Instead VBScript uses one of the many objects built into Microsoft Windows: sub process_html (up_http)
dim xmlhttp : set xmlhttp = createobject ("msxml2.xmlhttp.3.0")
xmlhttp.open "get", up_http, false
xmlhttp.send
In this case VBScript uses the XMLHTTP object to send a request to a web server for for a web page. The HTML returned from this web page will then be available in the XMLHTTP object's responseText property. Using VBScript to Process the Contents of a Web PageThe XMLHTTP object's responseText property will now contain all of the HTML returned from a web page in a single string. Of course a single string containing thousands of characters is rather unwieldy to work with. The next step, therefore, must be to break the string into manageable chunks. This can be done with the VB split method which creates an array from the text. The split method requires two inputs:
The programmer must, therefore, know the general structure for the string. However, a carriage return is always a good starting point: dim response_array : response_array = split (xmlhttp.responseText, vbcr)
Once the contents of the web page have been loaded into an array then this can be examined element by element to find the required information. This is made even easier by using regexp - VB regular expressions: dim re : Set re = new regexp
re.Pattern = "articles written by"
In this example the pattern "articles written by" is to be searched for, and so the next step is to loop through the array testing for the pattern: dim i
for i = 0 to ubound(response_array)
if re.Test(response_array(i)) then
msgbox i & " " & response_array(i)
end if
next
The output from the loop will be the line (or lines) containing the search pattern, and then the final step should be to free any memory used by the process: set re = nothing
set xmlhttp = nothing
end
This subroutine can now be run by supplying it with a web page to process: subprocess_html _
"http://www.suite101.com/writer_articles.cfm/linuxtalk/index.html"
The code should be saved to a file with a .vbs extension. It can then be run by double clicking on it in Windows Explorer. SummaryIn order for a programmer to use VBScript to analyse a web page they must:
The programmer can then use the array elements that they extract as part of their own VBScript application.
The copyright of the article Using VBScript to Extract Data from a Web Page in Windows Programming is owned by Mark Alexander Bain. Permission to republish Using VBScript to Extract Data from a Web Page in print or online must be granted by the author in writing.
Comments
Oct 29, 2009 11:20 AM
Guest :
1 Comment:
|
||||||
|
|
||||||
|
|
||||||