Curated by: Luigi Canali De Rossi
 


Saturday, December 11, 2004

How To Publish Your Existing Web Site Content To RSS?

Where are the automatic HTML to RSS conversion services?

I can't believe it.

I'm writing a book on RSS feeds, and as a result, I'm doing a lot of research on RSS. The book will be called Syndicating Web Sites with RSS Feeds For Dummies and it's for a non-technical audience -- for non-programmers.

RSS_XML_tags.gif

I'm up to a chapter on creating RSS feeds automatically.
(I've finished a chapter on using RSS creation tools such as UKOLN, NewzAlert's Composer and FeedForAll, and another chapter on writing feeds from scratch.) I've been researching scraping tools that Web site owners could use to automatically create a feed from their HTML pages. But I can't find any that doesn't require programming or uploading script files to the Web host's server (which requires a lot of configuration and which many Web hosts don't allow, including mine).
I'm shocked!

 

 

There's FeedFire, which Robin Good reviewed, but that excellent service is mostly appropriate for scraping someone else's Web site, where you can't control the HTML. It's not enough for one's own Web site because it only creates titles, with no descriptions/content. (Of course, the titles are links to the content on your Web site.) To create your own feed, you'd want a tool that specifies adding some <span> or <div> tags to your HTML to tell the scraper where the item descriptions are. These hints are necessary for converting HTML to a full-bodied RSS feed, according to all that I've read.
Several of these tools used to exist but are now defunct. BlogStreet works only if there are permalinks and fails if there aren't.

I thought that the RSSify service at Wytheville Community College (WCC), which used to be at VoidStar, did this. In fact the instructions tell you to put tags around your HTML, but it didn't work for me and David Carter-Tod, who is at WCC, e-mailed me that this service converts RSS to HTML, the opposite of what I'm looking for.
Site Summaries in XHTML at W3C has some similar instructions for marking up HTML, but I don't understand the instructions or even if they're providing a service, a download to use on your own, both, or neither.
I just want to put some tags in my HTML, upload the page, go to a Web site, insert the URL in a text box, click a button, and get a Download dialog box so I can download my RSS feed.

It seems that this would be a nice easy-to-use, automatic system for many people to create RSS feeds. Once the tags are in the HTML, updating to create a new item would be painless and wouldn't require going to a separate application.
Am I wrong that this would be useful?

Or perhaps I just haven't found it.

If you know of anything, I'd be interested to hear from you.

Right here, where every Good conversation starts.


Ellen Finkelstein -
 
 
Readers' Comments    
2005-07-18 07:33:57

govind

how to include my rss field into my html tag.how to publish my html(rss) file.



2005-04-08 18:04:46

thinkleb

There's also http://www.wotzwot.com

It's an online tool (RSSxl) with 3 different examples of typical scraping structure. It generates an RSS link for non-techies. You provide the beginning and end of the expressions.



2005-03-16 22:29:44

Jack Gardner

Did you hear about this new free RSS service called RapidFeeds? Well, I just signed up for their free account and published an RSS feed of my article with them. It's cool since i get to see how many hits my feed is getting...



2005-02-19 22:59:21

Honor

Robin,

Did you ever find this service. I too am looking for a service to include in my book at http://www.sitebuilditrss.com



2004-12-20 03:32:01

Peter Citarella

>>> To create your own feed, you'd want a tool that specifies adding some or

tags to your HTML to tell the scraper where the item descriptions are. These hints are necessary for converting HTML to a full-bodied RSS feed, according to all that I’ve read. <<<

Nope – this is not necessary. We do it all the time with MyST Scripting Services. Intel is a good example (rss.intel.com). However, tags would make the transformation issues less problematic. There are other bigger issues though with this brittle approach. Call us if you’d like to hear the details.

>>> I just want to put some tags in my HTML, upload the page, go to a Web site, insert the URL in a text box, click a button, and get a Download dialog box so I can download my RSS feed. <<<

It’s not that simple. An RSS feed must exist on a web server (or be generated dynamically from a web server). Most business require that the URL for such content stem from their own domain, so either the RSS document must be generated on the businesses domain, or generate from that URL. Solving the simple act of getting RSS from the page(s) is only a small part of the equation for success.

Another reason that the feed must exist on the businesses domain is because they want to measure the results of the feed (i.e., how many people clicked on page “x” as a result of reading feed “y”?)

Ellen, let us know if you would like further information on these comments.

Thanks,
-Peter



2004-12-17 17:38:17

Ellen Finkelstein

I've found what I think is the closest thing to automatic conversion. Too bad it takes 15 steps when I outline it in my book. It's at http://www.w3.org/2000/08/w3c-synd/# (they did respond and were helpful in answering my questions). Unfortunately, this converts XHTML to RSS, so you first have to convert your HTML page to XHTML. But fortunately, they have a tool to do just that, at http://cgi.w3.org/cgi-bin/tidy. So you start at the first page and follow the instructions to mark up your HTML. It's not complicated, but the instructions were less than clear to me (I suggested adding an example.) Then you go to the Tidy site, convert your page, then come back to the first page and input the resulting URL. It definitely works. Of course, I'd still like something simpler...



2004-12-16 23:06:14

Tony

Hi Ellen,

We host RSSgenr8 at XMLhub.com which is the sort of thing you are looking for. It is a modified version of RSSify, and many people are using it very successfully to generate RSS from their HTML.

We are also currently testing a hosted service that is able to scrape pretty well any page without the use of custom tags, but unlike FeedFire will generate descriptions as well as links. It should be available in January.



2004-12-16 01:12:38

Jim Gray

Ellen, how about thinking a little bit differently. Scraping a website implies that you want to pull content from another website and create a feed that could be included on your website. The reality is that you want your feed to be syndicated on their websites.

Now think about instead of trying to tag your HTML code so that a scraper could build a feed from it, turn this concept around and let your feed add content to your website. One of the things I did for the Quikonnex publisher's was to create a simple method whereas they could include any item posted in their feeds back into specific locations on their own websites. The placement of articles is completely independent of how they fall in the RSS feed.

Using this approach, you do not have to use the typical Blog approach to have a dynamically created RSS feed. You're, in effect, using your RSS feed as part of your CMS, not being reliant on a CMS to create the RSS feed.



2004-12-13 03:40:47

Ellen Finkelstein

Those responses are good, but confirm my conclusion. Tools like Grouper, which I'd downloaded, and Script4RSS are too complex for most non-programmers. Also, Grouper is meant for scraping other people's sites which is also what the O'Reilly article is about. They are for scraping in the true sense, not for creating an RSS feed from one's own site. I'm not sure about script4rss. Anyway, they certainly aren't appropriate for my audience. I did, of course, mention CMS's in my chapter, but there wasn't much to say, except what Cliff Allen said, which was that most of them do RSS now. I still think there is a need for an on-line service where people can go and get their feeds created. Plug in a URL and get out a feed.



2004-12-12 18:19:03

Rodney Rumford

Ellen,

We have some new resources that you might want to review in your book. www.myrsscreator.com is an easy way for anyone to create an RSS feed (you dont even need a website). This is a web based tool. (geared toward non techies) more details at the site.

We also have developed some really powerful syndication tools that are in beta. One of the new services is located at www.feedsyndicator.com again this service is geared toward non techies and is super simple to use. (all the upcoming features are not even listed yet)

contact me privately and i can share some additional details.

Rodney Rumford
The Info Guru LLC.
rssmarketing (at) gmail.com



2004-12-11 19:50:15

Cliff Allen

I think that soon all content management system (CMS) vendors will include a feature that automatically publishes an RSS feed.

We went a step further and added a personalized RSS feed feature to our CMS that tailors content to each reader's interests.

This is great for non-technical content creators because it's just point-and-click to indicate how and when a Web page should be included in the RSS feed.



2004-12-11 11:45:16

Robin Good

Dear Ellen,

I understand that while to the RSS enthusiasts everything looks easy and granted, for newcomers like you there is a lot of information that it is not easy to find.

Here a few pointers:

# Roy Osherove excellent article "Creating a Generic Site-To-Rss Tool - Introduction" sheds a lot of light on what to do exactly to parse a Web page's HTML text into manageable chunks of data. Roy shows how to create a generic tool that enables you to automatically generate an RSS feed from any website, given a small group of parameters.

# Script4rss
Script4rss takes a plain text file which holds a description for how the particular site should be converted and creates a perl script which is able to do that in the most fast and efficient way. Users don't have to know how to program but they need to know regular expressions. Script4rss also can:
Detect multiple catagories within an HTML page.
Extract information over multiple lines.
Pre-and append text in output.
Attempt to circumvent "variable" HTML.
You can get documentation and everything else from sourceforge.

# Grouper
This is really the Tool for novices and pro alikes. Grouper is both an RSS Generator and a Website Scraper rolled into one. Grouper is also part of a family of excellent tools that make it possible to search, filter, aggregate, splice, deduplicate and publish multiple RSS/Atom feeds. Antone Roundy, the developer behind this tool, provides a fantastic set of geeky tools for the entrepreneuring newsmaster at prices you can't beat. Grouper Evolution (USD $ 12.95) can even convert into RSS Google/Yahoo/Feedster and Daypop news sources. Details.Recommended.

# How To Scrape a Web Site
Even for Web sites that are NOT based on a modern-day content-management system it is possible to generate an RSS feed as long as the structure of its content respects a repetitive and easily identifiable publication structure. A technical read for webmasters who want to get te basics of how to create an RSS feed for any site that is not under their direct control. Highly recommended.

# Painless RSS with Template::Extract Wouldn't it be nice if you could simply visualize what data on a page looks like, explain it in template form to Perl, and not bother with the need for parsers, regular expressions, and other programmatic logic? That's exactly what Template::Extract helps you do.



 
posted by Robin Good on Saturday, December 11 2004, updated on Tuesday, May 5 2015

Creative Commons License
This work is licensed under a Creative Commons License.

2596

 

 

Real Time Web Analytics