I have a lot of Python rss/atom feeds in my aggregator and entries are
doubled all over the place.

Could'nt find any tool that would merge entries from several sources out
there, in a smart way, by trying to find doublons.

I wrote a little script, extending Mark Pilgrim's feedparser we use in
CPSRSS
, to merge several sources, using the difflib module and the rss
rendering we have in
CPSBlog.

It calculates the diff ratio on the title and content of each entry to
decide wheter
it's the same entry. When the ratio is <= 0.2 it's the same entry
(hopefully :) )

Here's an example ran on these:



The result is here
(It's a one-shot xmlfile, made today, so it's not a real feed
it is still readable by any client though)

Now I've been told that this was pretty useless, and that i would better
make some clean in my feeds and do more interesting stuff in my spare
time.

But i can't help it: everytime i see a feed related to python I just add
the stuff
to my client :'). So for an unorganized person like me, a CPRSS
personnal website with this merging capability, where i can drop tons of
feeds would be perfect.

(Post originally written by Tarek Ziadé on the old Nuxeo blogs.)