« 1.3 of the Calendar released! | Main | Is internet applications a bad idea? »

Oct 16, 2005

Merging RSS and Atom feeds from various sources

I have a lot of Python rss/atom feeds in my aggregator and entries are doubled all over the place.

Could'nt find any tool that would merge entries from several sources out there, in a smart way, by trying to find doublons.

I wrote a little script, extending Mark Pilgrim's feedparser we use in CPSRSS, to merge several sources, using the difflib module and the rss rendering we have in CPSBlog.

It calculates the diff ratio on the title and content of each entry to decide wheter
it's the same entry. When the ratio is <= 0.2 it's the same entry (hopefully :) )

Here's an example ran on these:

The result is here
(It's a one-shot xmlfile, made today, so it's not a real feed
 it is still readable by any client though)

Now I've been told that this was pretty useless, and that i would better make some clean in my feeds and do more interesting stuff in my spare time.

But i can't help it: everytime i see a feed related to python I just add the stuff
 to my client :'). So for an unorganized person like me, a CPRSS personnal website with this merging capability, where i can drop tons of feeds would be perfect.

(Post originally written by Tarek Ziadé on the old Nuxeo blogs.)

Comments

About Us

We're the friendly employees of Nuxeo, a leading open source software vendor, which develops a complete Enterprise Content Management (ECM) software platform to help companies better produce, process, publish, archive, expose and find their information from digital assets to transactional documents.

» Follow us @nuxeo (Twitter)

» Connect on LinkedIn

» Visit Nuxeo.com

 

Customize & Configure
Nuxeo • Studio

Nuxeo • DM
Online Trial

Nuxeo • DM
Download

Nuxeo • DAM
Download

Nuxeo Connect support