Recently I needed to collect information from a website (some contact information, etc.) and add it to a database for a project. This isn't too time consuming unless, like me, you need to visit about 50 pages.
So I wrote a quick Perl scripts that visits each page by running through a list of URLs in a text file, views the source and parses it into a text file for me. I had the script name the text file with a name same as the search criteria.
The script also strips out empty white space and all HTML tags (with some exceptions*).
*On line 25 you can see the HTML::Scrubber:StripScripts setup. The way I have it setup is to allow nothing, but you can change that. If you change the existing code from 0 to 1 the script will allow links.
Since this script works with HTML Blogger does not like parsing the script to text. You can download it from the link below:
Download Link
No comments:
Post a Comment