2008-07-23

Fun With Perl -- HTML Parser Script

Recently I needed to collect information from a website (some contact information, etc.) and add it to a database for a project. This isn't too time consuming unless, like me, you need to visit about 50 pages.

So I wrote a quick Perl scripts that visits each page by running through a list of URLs in a text file, views the source and parses it into a text file for me. I had the script name the text file with a name same as the search criteria.

The script also strips out empty white space and all HTML tags (with some exceptions*).

*On line 25 you can see the HTML::Scrubber:StripScripts setup. The way I have it setup is to allow nothing, but you can change that. If you change the existing code from 0 to 1 the script will allow links.

Since this script works with HTML Blogger does not like parsing the script to text. You can download it from the link below:

Download Link

No comments: