do me a favor

And help me test out a project I’m working on. The number of excerpt-only RSS feeds in my reader has finally hit the point where it began gnawing away at me. So I’ve taken a stab at writing a utility to automatically turn partial RSS feeds into full-text ones. If you’ve got any partial-text feeds that have been bugging you, I’d appreciate your giving it a try. If you run into problems, leave them in comments.

Be forewarned: it’s far from perfect. There are some sites that it can’t figure out — perhaps unsurprisingly, Wizznutzz is one of them. In other cases it’ll include duplicate titles, or unnecessary dates, or irrelevant parts of the site design. And sometimes it reports that full-text retrieval has failed when in fact it was just a very short entry. Oh! It also won’t be able to tell when an item has been updated.

But ignore all of that. I’m interested to know the cases (like Wizznutzz) where it fails completely. Functionality first, then beauty. Well, actually, functionality then optimization then maybe beauty — a general-purpose solution isn’t likely to ever make every feed look lovely. More important is the fact that it’s dog-slow when it’s not serving cached content. If I leave it running on this server all hell will eventually break loose. So consider this just for testing — I’ll get it to a stable point sometime soon, then move it to a more permanent residence.

This entry was posted on Tuesday, December 19th, 2006 at 11:18 am and is filed under tech. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

ogged says:

December 19, 2006 at 1:00 pm

Seems to fail for Yglesias’s feed.
XML Parsing Error: junk after document element
Location: http://www.metamonkey.net/fulltextrss/?url=http%3A%2F%2Fwww.matthewyglesias.com%2Ffeed
Line Number 2, Column 1:

Becks says:

December 19, 2006 at 1:21 pm

I’ve noticed a lot of my feeds switching from full-post to excerpts lately, especially people who upgrade to Blogger Beta. Unfortunately, this doesn’t appear to be working for those people. I tried Ficke and Sommer’s sites and got “*Full Text RSS Retrieval Failed*
Sorry! Here’s what was in the original feed:” for both.
I’ve been using the FetchLinks plug-in for NewsGator to retrieve the excerpt feeds. Nice that it’ll get the full text but the rendering is a lot slower (since it brings in the whole web page) than just grabbing the post text and doesn’t work very well when I’m reading posts I’ve downloaded when offline.

tom says:

December 19, 2006 at 2:26 pm

Ogged: sorry about that. I had to use the SimplePie RSS parser library instead of tried & true MagpieRSS when the latter choked on some high-ascii junk in Kevin Drum’s feed. Not only it SimplePie slower, but its caching permission stuff seems to be broken. Ugh. Well, give it another shot — I turned caching off, and it’s working for me.
Becks: I’m confused — Sommer’s got a full-text feed, as does Matt F. Definitely a false positive on the failure message, but the algorithm’s only designed to work with excerpted feeds anyway.

December 19, 2006 at 2:28 pm

Huh. That’s weird. Those are the links in my RSS reader but, for some reason, the people who have upgraded to Blogger Beta still always come in only as excerpts in my reader unless I use FetchLinks.
Wonder why the feeds have the full post but my reader’s only grabbing an excerpt. I’ll have to investigate.

December 19, 2006 at 2:54 pm

Good news! Updating to the latest version of my RSS reader fixed the Blogger Beta problem.
Bad news! At the cost of hosing all of my configuration settings.

Matt F says:

December 19, 2006 at 4:58 pm

Didn’t work for Matt Weiner’s site, looks like the same error that Ogged got for Drum’s place:
XML Parsing Error: junk after document element
Location: http://www.metamonkey.net/fulltextrss/?url=http%3A%2F%2Fmattweiner.net%2Fblog%2Findex.rdf
Line Number 2, Column 1:Warning: preg_match_all() [function.preg-match-all]: Compilation failed: missing terminating ] for character class at offset 8 in /home/metamon/public_html/fulltextrss/index.php on line 44
^

December 19, 2006 at 5:04 pm

actually, that’s a new one — thanks, Matt! The XML parsing error is just due to your browser expecting to get back well-formed XML, but instead getting a plaintext error message.
The particular error you got back is related to a regexp error. I think I need to tweak my algorithm…

Sean says:

January 3, 2007 at 2:39 pm

I’m using this on stereogum’s feed and getting strange occasional “missing entries” where the description in the generated rss is empty.
Oh, and thanks a lot for putting this together, it works well!

and says:

April 6, 2008 at 1:18 am

hi,
its asking me for a user/password when trying get full text from some rss…is this private only or? thanks

Tom says:

April 6, 2008 at 2:53 am

This was just an early version of the tool; it’s no longer hosted here. You can find a more recent version at this address.

April 7, 2008 at 1:12 am

oh i c, but that site gives me a “not found” error?

April 7, 2008 at 11:32 am

We were having some DNS issues that resulted in the domain pointed at the wrong server. It should now be working.

April 7, 2008 at 4:29 pm

ok thank you!