David Janes' Code Weblog

January 25, 2009

Creating OPML subscription lists using Pipe Cleaner

authentication,demo,pipe cleaner,pybm,python · David Janes · 11:40 am ·

Here’s a neat API I completed this morning, called api_feeds. It takes a URL (or a list of them) and transforms them into:

  • the home page associated with the URL
  • the feed(s) for the URL
  • the name of the home page

If you’re following along at home, this is essentially the information needed for a single outline in an OPML subscription list.

Here’s a simple python example:

api = api_feeds.OneFeed()
api.request = {
    "uri" : "http://code.davidjanes.com/blog/2009/01/23/transparently-working-with-oauath/",
}

pprint.pprint(api.response, width = 1)

And here’s what the output looks like:

{'link': u'http://code.davidjanes.com/blog',
 'links': [{'href': u'http://feeds.feedburner.com/DavidJanesCode',
            'rel': 'alternate',
            'type': u'application/rss+xml'}],
 'title': u"David Janes' Code Weblog"}

There’s actually quite a bit going on here behind the scenes, most of it using code I didn’t initially write but have quite heavily hacked: the Universal Feed Parser and the Feed Finder.

What becomes really interesting what happens when we combine this with other modules. Here’s an example of how we can build an OPML subscription list from all the posts I’ve tagged “python” and “django” in del.icio.us. The code looks up each link I’ve bookmarked, does the feed discovery above, filters out items that don’t have feeds, and outputs as OPML. Note the neat pipeline type aspect to the code:

api_delicious = api_delicious.PostsList(tag = "python django")
api_many = api_feeds.ManyFeeds(require_feed = True)
api_opml = api_opml.OPMLWriter()

api_many.items = api_delicious.items
api_opml.items = api_many.items

print api_opml.Produce()

Producing the following OPML:

<opml encoding="utf-8" version="2.0">
  <head>
    <title>[Untitled]</title>
  </head>
  <body>
    <outline htmlUrl="http://push.cx"
      rssUrl="http://push.cx/feed"
      text="Push cx"
      type="rss"/>
    <outline htmlUrl="http://crankycoder.com"
      rssUrl="http://crankycoder.com/feed/"
      text="crankycoder.com"
      type="rss"/>
    <outline htmlUrl="http://blog.dowski.com"
      rssUrl="http://blog.dowski.com/feed/"
      text="the occasional occurrence"
      type="rss"/>
    <outline htmlUrl="http://www.b-list.org/feeds/entries/"
      rssUrl="http://feeds2.feedburner.com/b-list-entries"
      text="The B-List: Latest entries"
      type="rss"/>
    <outline htmlUrl="http://blog.thescoop.org"
      rssUrl="http://blog.thescoop.org/feed/"
      text="The Scoop"
      type="rss"/>
    <outline htmlUrl="http://effbot.org"
      rssUrl="http://effbot.org/zone/rss.xml"
      text="effbot.org"
      type="rss"/>
    <outline htmlUrl="http://blog.disqus.net"
      rssUrl="http://feeds.feedburner.com/BigHeadLabs"
      text="Disqus"
      type="rss"/>
    <outline htmlUrl="http://blog.ianbicking.org"
      rssUrl="http://blog.ianbicking.org/feed/atom/"
      text="Ian Bicking: a blog"
      type="rss"/>
    <outline htmlUrl="http://antoniocangiano.com"
      rssUrl="http://feeds.feedburner.com/ZenAndTheArtOfRubyProgramming"
      text="Zen and the Art of Programming"
      type="rss"/>
    <outline htmlUrl="http://www.carthage.edu/webdev"
      rssUrl="http://www.carthage.edu/webdev/?feed=rss2"
      text="carthage webdev"
      type="rss"/>
    <outline htmlUrl="http://www.eweek.com"
      rssUrl="http://www.eweek.com/rss-feeds-13.xml"
      text="Application Development - RSS Feeds"
      type="rss"/>
    <outline htmlUrl="http://jeffcroft.com/"
      rssUrl="http://feeds.feedburner.com/jeffcroft/blog"
      text="JeffCroft.com: Latest blog entries"
      type="rss"/>
  </body>
</opml>

This will be just as terse (terser, probably) when written as a Pipe Cleaner script; I’m just struggling over how to introduce the authentication code gracefully into the scripts.

January 23, 2009

Transparently working with OAuath

authentication,demo,pipe cleaner,pybm,python · David Janes · 5:03 am ·

This is part one of two posts I’m going to write about OAuth; the second will be somewhat more critical in tone. Before I criticize – and I know it’s hard to put together technologically things like OAuth – I want to actually accomplish something with it, so I at least I appear that I have somewhat of a clue about it. This is a report of what I’ve done.

bm_uri is a libary and tool I’ve written for working with URIs, and in particular http:// and https:// URLs. Here are some of the advantages of using bm_uri over all the normal Python urllib and urllib2 methods:

  • downloads are cached; if a URL is temporarily not available, bm_uri will return the cached version, likewise if it has been downloaded in the near past, the cached version will be returned rather than hitting the net again
  • downloads can be cooked, meaning converted into a more useful form such as TIDY-cleaned up HTML, JSON, Unicode text and so forth
  • bm_uri handles all the protocol stuff for you (such as User-Agent, Last-Modified and so forth) so you don’t have to
  • authentication is handled “invisibly” as possible for you … at least after the initial setup

Here is an example of accessing a OAuth resource using bm_uri returning my current location from Fire Eagle as a Python object. From a programming point of a view, I believe I have reduced this to close to the minimum number of steps possible. Here’s the setup phase:

import bm_uri
import bm_oauth
import pprint

bm_cfg.cfg.initialize()

bm_oauth.OAuth(service_name = "fireeagle")

Here’s using it in code – note how there’s no reference to OAuth here whatsoever.

loader = bm_uri.JSONLoader('https://fireeagle.yahooapis.com/api/0.1/user.json?format=json')
loader.Load()

pprint.pprint(loader.GetCooked())

And here’s the output of the program:

{u'stat': u'ok',
 u'user': {u'location_hierarchy': [{u'best_guess': True,
         u'geometry': {u'coordinates': [-79.418426513699998,
                   43.731891632100002],
              u'type': u'Point'},
         u'id': 572261,
         u'label': None,
         u'level': 1,
         u'level_name': u'postal',
         u'located_at': u'2008-03-19T04:09:30-07:00',
...
         u'name': u'Canada',
         u'normal_name': None,
         u'place_id': u'EESRy8qbApgaeIkbsA',
         u'woeid': 23424775}],
     u'readable': True,
     u'writable': False}}

Gather information

The devil is in the details, obviously and with OAuth, the little satan is doing the initial setup. Here’s how I did this for Fire Eagle – there’ll be something analogous for whatever service you are using:

  • Log in or sign up (obviously)
  • Go to the Developers’ Page
  • Click on Create a New App
  • Copy the “Consumer Key” and the “Consumer Secret” … these will be long-ish strings of nonsense
  • Find out the Request Token URL, the Access Token URL, and the Authorization URL. These are public knowledge and for Fire Eagle are:
    • https://fireeagle.yahooapis.com/oauth/request_token
    • https://fireeagle.yahooapis.com/oauth/access_token
    • http://fireeagle.yahoo.net/oauth/authorize

Note how Yahoo has conveniently made that last URL similar looking to the others, but not quite the same. Thanks!

However you implement OAuth, you’re probably going to need to be able to persist information to disk or database. As documented here several weeks ago, we already have that covered with our bm_cfg module. In ~/.cfg/fireeagle.json, create the following JSON format file:

{
 "fireeagle": {
  "api_uri" : "https://fireeagle.yahooapis.com/",
  "oauth_access_token_url": "https://fireeagle.yahooapis.com/oauth/access_token",
  "oauth_authorization_url": "http://fireeagle.yahoo.net/oauth/authorize",
  "oauth_consumer_key": "ABCDEFGHIJKL",
  "oauth_consumer_secret": "ABCDEFGHIJKLMNOPQRSTUVWXYZ012345",
  "oauth_token_url": "https://fireeagle.yahooapis.com/oauth/request_token",
 }
}

The only new item here is the api_uri: that’s the prefix of URLs that bm_uri will use OAuth with.

Set it up

Next you have to do all sorts of OAuth stuff to actually work with OAuth. If the why interests you, please go read the spec! I’m more of how person myself, and this is what we need to do:

  • run: python bm_uri.py --service fireeagle --authorize
  • this will pop up a browser window; grant your application access and then…
  • run: python bm_uri.py --service fireeagle --exchange

And that’s it – you should now be able to just work with the Fire Eagle API in bm_uri without even having to know OAuth is there!

End notes

  • the current implementation only works with HTTP/REST GET; POST to come soon, DELETE and PUT as needed
  • bm_uri, bm_config and the rest of the code is freely licensed and available here. It is a constantly changing product, albeit converging on perfection in my own mind ;-)

December 18, 2008

Pipe Cleaner

demo,djolt,dqt,html / javascript,ideas,jd,maps,pipe cleaner,pybm,work · David Janes · 6:38 pm ·

I’ve been working (in my decreasing available spare time) on a project to pull together into a project called “Pipe Cleaner” all the various concepts I’ve been mentioning on this blog: Web Object Records (WORK) for API Access and object manipulation, Djolt for generating text from templates, Data/Query/Transform/Template (DQT) for transforming data and JD for scripting these elements together. The pieces came together this morning enough to put a demo together and here it is – the Toronto Fires Pt II Demo.

How, you may ask, does this differ from the original Toronto Fires Demo? The answer is how it is put together, which we describe here.

Index.dj

This is the Djolt template that generates the output. The data fed to this template is generate by the JD script, described in the next section.

<html>
<head>
    <link rel="stylesheet" type="text/css" href="css.css" />
    {{ gmaps.js|safe }}
</head>
<body>
<div id="content_wrapper">
    <div id="map_wrapper">
        {{ gmaps.html|safe }}
    </div>
    <div id="text_wrapper">
{% for incident in incidents %}
    <div id="{{ incident.IncidentNumber }}">
        {{ incident.body_sb|safe }}
    </div>
{% endfor %}
</div>
</body>
</html>

Quite simple … as you can see, most of the data is being pulled in from elsewhere. The elsewhere is provided by the script described in the next section.

Index.jd

This is the script that pull all the pieces together. Note that I’m not 100% happy with the way the data is imported, I would like the geocoding to become part of this data flow too. In the next release perhaps.

First we pull in the “fire” module that we wrote in the previous Map examples. This is doing exactly what you think: importing a Python module. We may have to increase the security or restrict this to working with an API for general purpose use.

import module:"fire";

Next we define two headers – one that is going to appear in the Google Maps popup, the next that is going to appear in the sidebar. They need to be different as they refer to themselves. Note that the sidebar header “breaks” the encapsulation of Google Maps – this seems to be unavoidable. The to:"fitem.head.map" and to:"fitem.head.sb" are manipulating a WORK dictionary to store values.

Note also here that we’ve extended JD to accept Python multiline strings – this was unavoidable if JD was to be useful to me.

set to:"fitem.head.map" value:"""
<h3>
<a href="#{{ IncidentNumber }}">{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}
</h3>
""";

set to:"fitem.head.sb" value:"""
<h3>
{% if latitude and longitude %}
<a href="javascript:js_maps.map.panTo(new GLatLng({{ latitude }}, {{ longitude }}))">*
{% endif %}
<a href="#{{ IncidentNumber }}">{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}
</h3>
""";

The next block defines the text of the body used to describe a fire incident. It follows much the same pattern as the previous block.

set to:"fitem.body" value:"""
<p>
Alarm Level: {{ AlarmLevel }}
<br />
Incident Type: {{ IncidentType }}
<br />
City: {{ City }}
<br />
Street: {{ Street }} ({{ CrossStreet }})
<br />
Units: {{ Units }}
</p>
""";

This is a map: it is translating the values in fire.GetGeocodeIncidents into a new format and storing that in incidents. The format that we were are storing it in is understood by the Google Maps generating module.

We may rename this translate, as the word map is somewhat overloaded.

map from:"fire.GetGeocodedIncidents" to:"incidents" map:{
    "latitude" : "{{ latitude }}",
    "longitude" : "{{ longitude }}",
    "title" : "{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}",
    "uri" : "{{ HOME_URI }}#{{ IncidentNumber }}",
    "body" : "{{ *fitem.head.map|safe }}{{ *fitem.body|safe }}",
    "body_sb" : "{{ *fitem.head.sb|safe }}{{ *fitem.body|safe }}",
    "IncidentNumber" : "{{ IncidentNumber }}"
};

Next we set up the “meta” (see WORK meta description if you’re not following along) for the maps. The render_value:true declaration makes PC interpret the templates in strings). We then call our Google Maps generating code (which are actually more Pipe Cleaners) and that gets fed to the Djolt template we first showed you. Clear? Maybe not, we’ll have more examples coming…

set to:"map_meta" render_value:true value:{
    "id" : "maps",
    "latitude" : 43.67,
    "longitude" : -79.38,
    "uzoom" : -13,
    "gzoom" : 13,
    "api_key" : "{{ cfg.gmaps.api_key|otherwise:'...mykey...' }}",
    "html" : {
        "width" : "1024px",
        "height" : "800px"
    }
};

load template:"gmaps.js" items:"incidents" meta:"map_meta";
load template:"gmaps.html" items:"incidents" meta:"map_meta";

December 8, 2008

Coding backwards for simplicity

djolt,dqt,ideas,pybm,python,work · David Janes · 4:58 pm ·

I haven’t been posting as much as I like here for the last three weeks, not because of lack of ideas but because I haven’t been able to consolidate what I’ve been working on into a coherent thought. I’m trying to come up with a overreaching conceptual arch that covers WORK, Djolt and the various API interfaces I’ve been coded. Tentatively and horribly, I’m calling this Data/Query/Transform/Template right now though I’m expecting this to change.

The first demo of this … without further explanation … can be seen here. More details about what this is actually demonstrating (besides formatting this blog) will be forthcoming.

What I want to draw attention to in this post is how I coded this. What I’ve been doing for the last several weeks is coding backwards: I start with what I want the final code to look like and then figure out all the libraries, little languages and so forth that would be needed to code that. After several false starts, my conceptual logjam broke about a week ago and code started radically simplifying.

The ideal code, in my mind, is almost entirely static declarations: no loops, no if statements, no while statements, no goto-type statements (god help us). We simply specify how the parts are connected, and hope that we can abstract the complexity into the libraries that make this all happen. The code that you see below is actually post all my conceptualizing: I just wanted to write some code and since I had almost all the parts together it fell together quite nicely:

import bm_wsgi
import bm_io

import djolt
import api_feed

from bm_log import Log

class Application(bm_wsgi.SimpleWrapper):
    def __init__(self, *av, **ad):
        bm_wsgi.SimpleWrapper.__init__(self, *av, **ad)

    def CustomizeSetup(self):
        self.html_template_src = bm_io.readfile("index.dj")
        self.html_template = djolt.Template(self.html_template_src)

        self.context = djolt.Context()
        self.context["paramd"] = {
            "feed" : "http://feeds.feedburner.com/DavidJanesCode",
            "template" : """\
<ul>
{% for item in data.items %}
	<li><a href="{{ item.link }}">{{ item.title }}</a></li>
{% endfor %}
""",
        }
        self.context.Push()
        self.context["paramd"] = self.paramd
        self.context["data"] = api_feed.RSS20(self.context.as_string("paramd.feed"))

    def CustomizeContent(self):
        yield   self.html_template.Render(self.context)

if __name__ == '__main__':
    Application.RunCGI()

There’s almost nothing there! In particular, note:

  • bm_wsgi.SimpleWrapper handles all the WSGI interface work, including determining when to output HTML headers, error trapping, and Unicode to UTF-8 encoding
  • the most complicated part of the application is setting up the Context. In particular, note that self.paramd is automatically populated by the QUERY_STRING passed to the application, and the double setting we do here allows us to have default values.
  • If you want to see the HTML template that drives the application it is here. Note two variations from Django templates: the {% asis %} block which doesn’t intrepret it’s content as Djolt code and the {{ *paramd.template|safe }} variable which interprets the variable’s contents as a template.
  • Methods called Customize-something are my convention for framework functions, i.e. methods that will be called for us rather than methods we call.

November 28, 2008

Djolt – Django-like Templates

djolt,pybm,python,work · David Janes · 4:34 pm ·

Djolt is a reimplementation of Django’s template language in Python. Why do this?

  • I like the Django template language
  • I wanted something that small and independent of Django
  • I wanted something that will work with WORK paths (this was the real deal breaker for using Django)
  • I wanted something that I could take and reimplement in Javascript and maybe Java too
  • Some template engines, Cheetah for example, are far too heavy for the kind of light-weight applications I have in mind; note that I’ve had great success with Cheetah in the past
  • Some template engines, such as that in Python 2.6, are for too underfeatured

However, if you’re really looking for the whole Django template experience and don’t want to use Djolt, just start here.

How do I get it?

Djolt is packaged as part of the pybm library.

How do I use it?

import djolt

t = djolt.Template("""
<ul>
{% for name in names %}
<li>{{ name }}</li>
{% endfor %}
</ul>
""")
print t.Render({
    "names" : [ "Johnny", "Jack", "Ray", "Mary & Sam", ]
})

Which gives the results:

<ul>
<li>Johnny</li>
<li>Jack</li>
<li>Ray</li>
<li>Mary &amp; Sam</li>
</ul>

Note the “autoescaping” of the & character.

What tags does it define?

  • autoescape/endautoescape
  • if/else/endif
  • equal/endequal
  • for/endfor
  • notequal/notendequal

It does not implement blocks.

What filters does it define?

  • add
  • cut
  • default (see otherwise below)
  • default_if_none
  • divisibleby
  • first
  • join
  • last
  • length
  • length_is
  • linebreaks
  • lower
  • pluralize
  • random
  • safe (respecting all the Django autoescape rules)
  • slug
  • upper

Unimplemented filters are due to laziness and will be done “on demand”. We also introduce a few new filters:

  • jslug – like slug, but more Javascript friendly
  • otherwise – like default, except the empty string/empty values trigger the filter also

Are their differences between Djolt and Django templates?

  • Djolt tags suck up whitespace if they’re on a line by themselves
  • If Djolt cannot resolve a variable, it resolves to the appropriate “empty” value (as opposed to failing). This is keeping in line with WORK philosophy

Beyond that you should be able to use most Django template examples (that don’t use block/implements) as-is.

Is it extensible?

Yes. You can add your own tags and filters by following the examples in code (djolt_nodes.py and djolt_filters.py respectively).

November 11, 2008

WORK – Web Object Records

ideas,pybm,python,semantic web,work · David Janes · 11:37 am ·

Introduction

As technologists, we’re all familiar with REST – Representational State Transfer:

Representational state transfer (REST) is a style of software architecture for distributed hypermedia systems such as the World Wide Web. As such, it is not strictly a method for building what are sometimes called “web services.” The terms “representational state transfer” and “REST” were introduced in 2000 in the doctoral dissertation of Roy Fielding, one of the principal authors of the Hypertext Transfer Protocol (HTTP) specification.

REST talks about how we address and use information on the World Wide Web. I’d like to introduce the concept of WORK -  Web Object Records – which defines how we think about data being transmitted across the web.

WORK is not a descriptive standard – it is not telling you what to do, it’s describing what you are doing. The hope is that by having a delineated description of what we are doing, we can then write tools to cut through the babel of API standards being currently promulgated by a multitude of vendors; we can standardize the unstandarded.

Defintion

A WORK item:

  • is conceptually a JSON-like dictionary, consisting of string keys and object values
  • each value in the dictionary is a (usually-) shallow JSON-like object, that is:
    • a dictionary, list or basic value type
  • the basic value types are Unicode strings, floating point numbers, integers and booleans
  • the difference between strings and other basic value types is fuzzy (data encoded in XML, HTML form data)
  • null/None is rarely explicitly sent, instead it is the absence of a value being defined
  • the difference between a list of objects and a single object is fuzzy and fluid (XML children)
  • the data model defined implicitly by “what you see” is as useful as formal definition elsewhere
  • there are no cycles or explicit ways of cross referencing within a WORK item
  • WORK items can – and often are – nested within another WORK item, but only one level deep

Benefits

Because we technologists inherently use a WORK model of data, it explains:

  • why we prefer XML over CSV – because we like to store more that a single atomic value in a “cell”
  • why we prefer JSON to XML – because we think about data as JSON-like WORK objects, not as nested text constructs
  • why we don’t adopt RDF (in it’s variants) for transmitting data, implementing APIs and so forth – because we don’t think in graphs
  • why we find it easier to work with web data in Python and Ruby than in Java – because those languages explicitly use the same model for storing data as we think about the data

Examples

Here are a few examples of how one can view common API / feed results as WORK items.

RSS feeds

RSS is defined by a two level WORK hierarchy. The first level is:

{
  "channel" : CHANNEL-WORK,
  "item" : [ ITEM-WORK, ITEM-WORK, ... ]
}

A ITEM-WORK looks like:

{
  "title" : STRING,
  "link" : STRING,
  "description" : STRING
}

If you look at at the XML for a RSS feed with only 1 ITEM, there’s no way to tell without reading the spec than ITEM repeats. This is what we mean by saying that the difference between a single object and a list is sometimes fuzzy.

White Pages API

The White Pages API is also a two level WORK hierarchy (this pattern is very very common). Here’s the first level, slightly more complicated than RSS due to the XML serialization:

{
 "meta" : META-WORK,
 "listings" : {
   "listing" : [ LISTING-WORK, LISTING-WORK, ... ]
 }
}

A LISTING-WORK looks like:

{
  "geodata" : OBJECT,
  "phonenumbers" : OBJECT,
  "business" : { "businessname" : "Fred's Pizza" },
  "address" : OBJECT
}

The OBJECTs above in the White Pages API are somewhat complicated, but tractable (as we shall see in another post)

Amazon AWS API

The Amazon Associates Web Service allows one to retrieve information about Amazon products via XML responses. The response is a little convoluted but still recognizable:

{
 "Items" : {
   "RequestHeader" : REQUEST-HEADER-WORK,
   "Item" : [ ITEM-WORK, ITEM-WORK, ... ]
 },
 "OperationRequest" : { ... }

The individual ITEM-WORK describe products:

{
 "ASIN" : STRING,
 "ImageSets": {
   "ImageSet": {
    "LargeImage": {
     "URL": "http://ecx.images-amazon.com/images/I/31e55zf53VL.jpg",
     "Width": "300",
     "Height": "300"
   },
  },
 "ItemAttributes": {
   "Title": "Under a Blood Red Sky - Deluxe Edition CD/DVD",
   "Manufacturer": "Island",
   "ProductGroup": "Music",
   "Artist": "U2"
 }
}
Google search result

We can also look at HTML pages as if they’re returning data as WORK items. This could be explicit if rules such as microformats or RDFa were used,  or once again it could be just a convenient way of modeling the data. Here’s a hypothetical WORK item for a single result returned from a Google:

{
 "title" : "Bombardier Inc. - Bombardier - Home",
 "url " : "http://www.bombardier.com/",
 "description" : "Manufacturers of a large range of regional...",
 "links" : [
  {
   "title" : "Careers",
   "url" : "...",
  },
  {
   "title" : "Business Aircraft",
   "url" : "...",
  },
  ...
 ]
}

Conclusion

WORK gives us a powerful way of looking at – at simplifying – data that’s retrieved over the Internet via REST calls. If we can view API results as being made up of standardized components – WORK items – then the amount of work we need to do to work with new APIs can be absolutely minimized.

Designing and writing some of these tools is my next task.

Powered by WordPress

Switch to our mobile site