David Janes' Code Weblog

January 25, 2009

Creating OPML subscription lists using Pipe Cleaner

authentication, demo, pipe cleaner, pybm, python · David Janes · 11:40 am ·

Here’s a neat API I completed this morning, called api_feeds. It takes a URL (or a list of them) and transforms them into:

  • the home page associated with the URL
  • the feed(s) for the URL
  • the name of the home page

If you’re following along at home, this is essentially the information needed for a single outline in an OPML subscription list.

Here’s a simple python example:

api = api_feeds.OneFeed()
api.request = {
    "uri" : "http://code.davidjanes.com/blog/2009/01/23/transparently-working-with-oauath/",
}

pprint.pprint(api.response, width = 1)

And here’s what the output looks like:

{'link': u'http://code.davidjanes.com/blog',
 'links': [{'href': u'http://feeds.feedburner.com/DavidJanesCode',
            'rel': 'alternate',
            'type': u'application/rss+xml'}],
 'title': u"David Janes' Code Weblog"}

There’s actually quite a bit going on here behind the scenes, most of it using code I didn’t initially write but have quite heavily hacked: the Universal Feed Parser and the Feed Finder.

What becomes really interesting what happens when we combine this with other modules. Here’s an example of how we can build an OPML subscription list from all the posts I’ve tagged “python” and “django” in del.icio.us. The code looks up each link I’ve bookmarked, does the feed discovery above, filters out items that don’t have feeds, and outputs as OPML. Note the neat pipeline type aspect to the code:

api_delicious = api_delicious.PostsList(tag = "python django")
api_many = api_feeds.ManyFeeds(require_feed = True)
api_opml = api_opml.OPMLWriter()

api_many.items = api_delicious.items
api_opml.items = api_many.items

print api_opml.Produce()

Producing the following OPML:

<opml encoding="utf-8" version="2.0">
  <head>
    <title>[Untitled]</title>
  </head>
  <body>
    <outline htmlUrl="http://push.cx"
      rssUrl="http://push.cx/feed"
      text="Push cx"
      type="rss"/>
    <outline htmlUrl="http://crankycoder.com"
      rssUrl="http://crankycoder.com/feed/"
      text="crankycoder.com"
      type="rss"/>
    <outline htmlUrl="http://blog.dowski.com"
      rssUrl="http://blog.dowski.com/feed/"
      text="the occasional occurrence"
      type="rss"/>
    <outline htmlUrl="http://www.b-list.org/feeds/entries/"
      rssUrl="http://feeds2.feedburner.com/b-list-entries"
      text="The B-List: Latest entries"
      type="rss"/>
    <outline htmlUrl="http://blog.thescoop.org"
      rssUrl="http://blog.thescoop.org/feed/"
      text="The Scoop"
      type="rss"/>
    <outline htmlUrl="http://effbot.org"
      rssUrl="http://effbot.org/zone/rss.xml"
      text="effbot.org"
      type="rss"/>
    <outline htmlUrl="http://blog.disqus.net"
      rssUrl="http://feeds.feedburner.com/BigHeadLabs"
      text="Disqus"
      type="rss"/>
    <outline htmlUrl="http://blog.ianbicking.org"
      rssUrl="http://blog.ianbicking.org/feed/atom/"
      text="Ian Bicking: a blog"
      type="rss"/>
    <outline htmlUrl="http://antoniocangiano.com"
      rssUrl="http://feeds.feedburner.com/ZenAndTheArtOfRubyProgramming"
      text="Zen and the Art of Programming"
      type="rss"/>
    <outline htmlUrl="http://www.carthage.edu/webdev"
      rssUrl="http://www.carthage.edu/webdev/?feed=rss2"
      text="carthage webdev"
      type="rss"/>
    <outline htmlUrl="http://www.eweek.com"
      rssUrl="http://www.eweek.com/rss-feeds-13.xml"
      text="Application Development - RSS Feeds"
      type="rss"/>
    <outline htmlUrl="http://jeffcroft.com/"
      rssUrl="http://feeds.feedburner.com/jeffcroft/blog"
      text="JeffCroft.com: Latest blog entries"
      type="rss"/>
  </body>
</opml>

This will be just as terse (terser, probably) when written as a Pipe Cleaner script; I’m just struggling over how to introduce the authentication code gracefully into the scripts.

2 comments »

  1. Allen Jackson · 2009-02-26 10:58

    Cool!! I am trying to do exactly that… I have a long list of URLs in HTML and want to find feeds and convert to OPML. I am no longer much of a coder, but can run a Python script. Any chance you could expound on how to do this??

    Thanks!

  2. David Janes · 2009-02-27 09:33

    Hi Allen,

    Coming in the next few days … all the code’s up on Google but I’ll have a description of how to put it all together soon!

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress