Curated by: Luigi Canali De Rossi

Monday, April 12, 2004

RSS Scraping? How To Create A Generic Web-To-RSS Converter

This is a technical article that requires knowledge of programming and XML. It offers one greatly documented approach (5000 words) to how a "generic" non-RSS powered site or blog can be made to generate a standard RSS feed. The article provides also useful information on how to use regular expressions to parse a web page's HTML text into manageable chunks of data; it shows how that data is converted and written as an RSS feed and shows how to create a generic tool that allows you to automatically generate an RSS feed from any given website, given a small group of parameters. At the end of the article a small zipped file download provides the interested technical reader with a few RSS tools that facilitate the carrying out of such "scraping" task. Contrary to what stated by the author in the introductory post to this article, scraping a site to create an RSS feed, does not produce an economic damage to the site scraped as long as, the smart newsmaster, is honest and ethical enough, to create a feed that scrapes excerpts or titles only of a site content, sending back more visitors that want to have the full scoop on any interesting news item. As a matter of fact I strongly consider the art of ethically syndicating otherwise hard to reach content sites an effective booster of exposure, visibility, traffic and credibility for most anyone site.



Roy Osherove -
Reference: ISerializable [ Read more ]
Readers' Comments    
blog comments powered by Disqus
posted by Robin Good on Monday, April 12 2004, updated on Tuesday, May 5 2015

Creative Commons License
This work is licensed under a Creative Commons License.




Real Time Web Analytics