February 08, 2004

Home-ripped RSS feeds

During Christmas, when I was holed-up in the Kuala Lumpur Mandarin Oriental, I was thinking about home-ripping RSS feeds to sites I enjoyed.

Well it turns out I was prescient, but lazy.

Just a few weeks ago I bought the O'Reilly spidering hacks book but haven't done anything with it because I've been busy with another software project. Tonight I was discussing my need to rip a site into a nice summary and 31die said, "make sure you use Template::Extract"

It's not as easy as a GUI scraping utility, but it's getting close. The idea is that the Template:: modules can take HTML output and a loosely specified template, then produce a formal data structure that I can then rend into whatever I want, handsome HTML for printing, or an RSS feed for viewing through my newsreader. The loosely specified part of it is what makes it neat -- it will be much easier to specifiy less-brittle scraping regexps.

Cool... so now I can embark on making an RSS feed for Slashdot and Drudgereport... Pretty slick...

Posted by Nils Blutig at February 8, 2004 02:54 PM | TrackBack