David Janes' Code Weblog

February 21, 2009

Using Pipe Cleaner to convert CSV list of Science Journals to an OPML subscription list

demo,pipe cleaner · David Janes · 3:43 pm ·

Here’s a Pipe Cleaner script to convert this text list of Science Journals and converts it an OPML subscription list (here)

import module:api_csv.CSV;

CSV uri:'http://www.tictocs.ac.uk/text.php' delimeter:'\t';

items := map value:$items map:{
    "title" : "{{ C1 }}",
    "links" : {
        "href" : "{{ C2 }}",
        "type" : "text/xml",
        "rel" : "alternate",
    }
};

I decided not to use the “header name” feature of the CSV command because I had to remap anyway to create the links object. This has to be run with the following command (or from the web UI):

pc --format opml science-journals

Of course, this is a little unwieldy in size so maybe you only want journals with “Astrophysics” in their title:

import module:api_csv.CSV;

CSV uri:'http://www.tictocs.ac.uk/text.php' delimeter:'\t';

items := map value:$items map:{
    "title" : "{{ C1 }}",
    "links" : {
        "href" : "{{ C2 }}",
        "type" : "text/xml",
        "rel" : "alternate",
    }
};

items := search value:$items for:"Astrophysics";

Cool, eh? Not only this, this can be run entirely from the Web Interface with selectable strings so (theoretically) a Pipe Cleaner user would have an API to this data.

January 26, 2009

Pipe Cleaner – a delicious example

demo,pipe cleaner · David Janes · 7:08 pm ·

This is going to be a very brief post: here’s how you use Pipe Cleaner to download every in your delicious account tagged “python”  – outputing it as OPML, RSS, or Atom comes for free:

import module:api_delicious;
api_delicious.PostsList to:items tag:"python" authenticate:delicious;

January 25, 2009

Creating OPML subscription lists using Pipe Cleaner

authentication,demo,pipe cleaner,pybm,python · David Janes · 11:40 am ·

Here’s a neat API I completed this morning, called api_feeds. It takes a URL (or a list of them) and transforms them into:

  • the home page associated with the URL
  • the feed(s) for the URL
  • the name of the home page

If you’re following along at home, this is essentially the information needed for a single outline in an OPML subscription list.

Here’s a simple python example:

api = api_feeds.OneFeed()
api.request = {
    "uri" : "http://code.davidjanes.com/blog/2009/01/23/transparently-working-with-oauath/",
}

pprint.pprint(api.response, width = 1)

And here’s what the output looks like:

{'link': u'http://code.davidjanes.com/blog',
 'links': [{'href': u'http://feeds.feedburner.com/DavidJanesCode',
            'rel': 'alternate',
            'type': u'application/rss+xml'}],
 'title': u"David Janes' Code Weblog"}

There’s actually quite a bit going on here behind the scenes, most of it using code I didn’t initially write but have quite heavily hacked: the Universal Feed Parser and the Feed Finder.

What becomes really interesting what happens when we combine this with other modules. Here’s an example of how we can build an OPML subscription list from all the posts I’ve tagged “python” and “django” in del.icio.us. The code looks up each link I’ve bookmarked, does the feed discovery above, filters out items that don’t have feeds, and outputs as OPML. Note the neat pipeline type aspect to the code:

api_delicious = api_delicious.PostsList(tag = "python django")
api_many = api_feeds.ManyFeeds(require_feed = True)
api_opml = api_opml.OPMLWriter()

api_many.items = api_delicious.items
api_opml.items = api_many.items

print api_opml.Produce()

Producing the following OPML:

<opml encoding="utf-8" version="2.0">
  <head>
    <title>[Untitled]</title>
  </head>
  <body>
    <outline htmlUrl="http://push.cx"
      rssUrl="http://push.cx/feed"
      text="Push cx"
      type="rss"/>
    <outline htmlUrl="http://crankycoder.com"
      rssUrl="http://crankycoder.com/feed/"
      text="crankycoder.com"
      type="rss"/>
    <outline htmlUrl="http://blog.dowski.com"
      rssUrl="http://blog.dowski.com/feed/"
      text="the occasional occurrence"
      type="rss"/>
    <outline htmlUrl="http://www.b-list.org/feeds/entries/"
      rssUrl="http://feeds2.feedburner.com/b-list-entries"
      text="The B-List: Latest entries"
      type="rss"/>
    <outline htmlUrl="http://blog.thescoop.org"
      rssUrl="http://blog.thescoop.org/feed/"
      text="The Scoop"
      type="rss"/>
    <outline htmlUrl="http://effbot.org"
      rssUrl="http://effbot.org/zone/rss.xml"
      text="effbot.org"
      type="rss"/>
    <outline htmlUrl="http://blog.disqus.net"
      rssUrl="http://feeds.feedburner.com/BigHeadLabs"
      text="Disqus"
      type="rss"/>
    <outline htmlUrl="http://blog.ianbicking.org"
      rssUrl="http://blog.ianbicking.org/feed/atom/"
      text="Ian Bicking: a blog"
      type="rss"/>
    <outline htmlUrl="http://antoniocangiano.com"
      rssUrl="http://feeds.feedburner.com/ZenAndTheArtOfRubyProgramming"
      text="Zen and the Art of Programming"
      type="rss"/>
    <outline htmlUrl="http://www.carthage.edu/webdev"
      rssUrl="http://www.carthage.edu/webdev/?feed=rss2"
      text="carthage webdev"
      type="rss"/>
    <outline htmlUrl="http://www.eweek.com"
      rssUrl="http://www.eweek.com/rss-feeds-13.xml"
      text="Application Development - RSS Feeds"
      type="rss"/>
    <outline htmlUrl="http://jeffcroft.com/"
      rssUrl="http://feeds.feedburner.com/jeffcroft/blog"
      text="JeffCroft.com: Latest blog entries"
      type="rss"/>
  </body>
</opml>

This will be just as terse (terser, probably) when written as a Pipe Cleaner script; I’m just struggling over how to introduce the authentication code gracefully into the scripts.

Pipe Cleaner Progress

pipe cleaner · David Janes · 11:09 am ·

I’ve made substantial progress in the spare hours I have on Pipe Cleaner recently. The current plan is to spend February documenting and packaging and start selling it whomever needs it. In your case, the price is almost certainly free, so no worries ;-)

One discovery I’ve made this morning is that the command line application is going to be as important as the web application. This is because certain scripts — as you’ll see in the next post — inherently take a long time to run and they’ll almost certainly cause timeouts on an HTTP interface. I’ve thought about a few ways to work around this, and may yet implement them, but there is almost certain going to be a command line set of tools.

January 23, 2009

Transparently working with OAuath

authentication,demo,pipe cleaner,pybm,python · David Janes · 5:03 am ·

This is part one of two posts I’m going to write about OAuth; the second will be somewhat more critical in tone. Before I criticize – and I know it’s hard to put together technologically things like OAuth – I want to actually accomplish something with it, so I at least I appear that I have somewhat of a clue about it. This is a report of what I’ve done.

bm_uri is a libary and tool I’ve written for working with URIs, and in particular http:// and https:// URLs. Here are some of the advantages of using bm_uri over all the normal Python urllib and urllib2 methods:

  • downloads are cached; if a URL is temporarily not available, bm_uri will return the cached version, likewise if it has been downloaded in the near past, the cached version will be returned rather than hitting the net again
  • downloads can be cooked, meaning converted into a more useful form such as TIDY-cleaned up HTML, JSON, Unicode text and so forth
  • bm_uri handles all the protocol stuff for you (such as User-Agent, Last-Modified and so forth) so you don’t have to
  • authentication is handled “invisibly” as possible for you … at least after the initial setup

Here is an example of accessing a OAuth resource using bm_uri returning my current location from Fire Eagle as a Python object. From a programming point of a view, I believe I have reduced this to close to the minimum number of steps possible. Here’s the setup phase:

import bm_uri
import bm_oauth
import pprint

bm_cfg.cfg.initialize()

bm_oauth.OAuth(service_name = "fireeagle")

Here’s using it in code – note how there’s no reference to OAuth here whatsoever.

loader = bm_uri.JSONLoader('https://fireeagle.yahooapis.com/api/0.1/user.json?format=json')
loader.Load()

pprint.pprint(loader.GetCooked())

And here’s the output of the program:

{u'stat': u'ok',
 u'user': {u'location_hierarchy': [{u'best_guess': True,
         u'geometry': {u'coordinates': [-79.418426513699998,
                   43.731891632100002],
              u'type': u'Point'},
         u'id': 572261,
         u'label': None,
         u'level': 1,
         u'level_name': u'postal',
         u'located_at': u'2008-03-19T04:09:30-07:00',
...
         u'name': u'Canada',
         u'normal_name': None,
         u'place_id': u'EESRy8qbApgaeIkbsA',
         u'woeid': 23424775}],
     u'readable': True,
     u'writable': False}}

Gather information

The devil is in the details, obviously and with OAuth, the little satan is doing the initial setup. Here’s how I did this for Fire Eagle – there’ll be something analogous for whatever service you are using:

  • Log in or sign up (obviously)
  • Go to the Developers’ Page
  • Click on Create a New App
  • Copy the “Consumer Key” and the “Consumer Secret” … these will be long-ish strings of nonsense
  • Find out the Request Token URL, the Access Token URL, and the Authorization URL. These are public knowledge and for Fire Eagle are:
    • https://fireeagle.yahooapis.com/oauth/request_token
    • https://fireeagle.yahooapis.com/oauth/access_token
    • http://fireeagle.yahoo.net/oauth/authorize

Note how Yahoo has conveniently made that last URL similar looking to the others, but not quite the same. Thanks!

However you implement OAuth, you’re probably going to need to be able to persist information to disk or database. As documented here several weeks ago, we already have that covered with our bm_cfg module. In ~/.cfg/fireeagle.json, create the following JSON format file:

{
 "fireeagle": {
  "api_uri" : "https://fireeagle.yahooapis.com/",
  "oauth_access_token_url": "https://fireeagle.yahooapis.com/oauth/access_token",
  "oauth_authorization_url": "http://fireeagle.yahoo.net/oauth/authorize",
  "oauth_consumer_key": "ABCDEFGHIJKL",
  "oauth_consumer_secret": "ABCDEFGHIJKLMNOPQRSTUVWXYZ012345",
  "oauth_token_url": "https://fireeagle.yahooapis.com/oauth/request_token",
 }
}

The only new item here is the api_uri: that’s the prefix of URLs that bm_uri will use OAuth with.

Set it up

Next you have to do all sorts of OAuth stuff to actually work with OAuth. If the why interests you, please go read the spec! I’m more of how person myself, and this is what we need to do:

  • run: python bm_uri.py --service fireeagle --authorize
  • this will pop up a browser window; grant your application access and then…
  • run: python bm_uri.py --service fireeagle --exchange

And that’s it – you should now be able to just work with the Fire Eagle API in bm_uri without even having to know OAuth is there!

End notes

  • the current implementation only works with HTTP/REST GET; POST to come soon, DELETE and PUT as needed
  • bm_uri, bm_config and the rest of the code is freely licensed and available here. It is a constantly changing product, albeit converging on perfection in my own mind ;-)

January 19, 2009

Atom as a Rosetta Stone for WORK objects

demo,pipe cleaner,work · David Janes · 6:31 am ·

WORK – Web Object Records – is a way of describing messages we pass over the web: a single header object called the “meta” and zero or more objects called “items”. Each object can be encoded as a JSON record, though we can access invidual items within each WORK object using a WORK Path which allows quite a bit of latitude for type coercision and vagarities in packaging.

Pipe Cleaner is a project I’ve been working on for the last two months that allows one to script data using WORK, to accomplish tasks such as remixing and filtering RSS feeds, read or produce OPML, make JSON interfaces and so forth. I actually have one live deployment which I will blog about soon and hope to have it beta productized for March.

Atom is a standard for syndicating feeds, not unsimilar to RSS but with a richer better described vocabulary. I already have one major “project” built around Atom: the hAtom microformat for describing microcontent and information that can be syndicated. hAtom has also been morphed by Microsoft to produce the Web Slice format, so you may be seeing that about. Atom is conforms to WORK: there’s a “feed” meta header and zero or more “entry” items.

With Pipe Cleaner I’m trying not only to make a way where feeds and other data can be remixed, but also make it easy to do so! To do that, I’ve decided that be default, even though you are working with (say) OPML or RSS, we’ll translate all the terms to their Atom equivalents as best as possible. You’ll have to read the spec yourselves, but here’s a quick rundown of common elements, not all required by any means:

  • author, with possible sub-fields uri and email
  • content – the body
  • summary – a summary of the body; currently my feeling is that content & summary must always be HTML
  • updated – when last updated
  • created – when created, assume to be updated if not present
  • link – the main URI
  • links – for alternate URIs (this is a variance from the Atom spec; it should be easy to find the main URI for an element; I may reconsider this before release)
  • id – a unique identifier
  • category – tags, encoded in a sub-field term

Note that I’m not slavish about making the output conformant to all the SHOULDs, MUSTs, etc. that are in the Atom spec: my pragmatic programming approach says “do the best we can” and if the user needs better, they can walk the extra mile.

Here’s some examples of data that’s been run through Pipe Cleaner, translating to Atom upon input and translating back to whatever is needed upon output. The JSON (actually pretty printed JSON) output is the most instructive for what’s going to inside Pipe Cleaner.

RSS Feed

OPML Data

Note how the OPML is “flattened”, with hierarchy being encoded into the Category. This can be turned off if needed.

hCard microformat (in HTML)

Note the neat namespacing in the RSS output. The OPML is almost devoid of useful information, further consideration is needed.

hCalendar microformat (in HTML)

Similar to hCard. We’ll probably also (or exclusively) encode the hCalendar data in an xCal extension.

hAtom microformat (in HTML)

hAtom -> RSS is basically turning an hAtom page into a feed!

Source example

Since no blog post is complete without a little source code, here’s a Pipe Cleaner script to parse the hCard document. If you’re following closely, the output format is selected by the user at runtime. All the other scripts are of similar terseness.

import module:api_microformat;
api_microformat.HCard uri:"http://tantek.com/" to:items meta:meta;

December 23, 2008

Pipe Cleaner (II)

demo,maps,pipe cleaner · David Janes · 6:37 am ·

Here’s the latest evolution of Pipe Cleaner, mainly recorded here for historical interest. The big change is that there isn’t a separate outside template – everything is in the one index.jd file. The new directive is template, which can read and execute an outside module or actually produce the final output (as we see in the very last directive). I have not put this up as an independent demo.

#
#	Import the Python fire module
#	- used in: map from:"fire.GetGeocodedIncidents" to:"incidents"
#
import module:"fire";

#
#	Header for Google Maps popup
#	- used in: map from:"fire.GetGeocodedIncidents" to:"incidents"
#
#
set to:"fitem.head.map" value:"""
<h3>
<a href="#{{ IncidentNumber }}">{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}</a>
</h3>
""";

#
#	Header for the sidebar
#	- used in: map from:"fire.GetGeocodedIncidents" to:"incidents"
#
set to:"fitem.head.sb" value:"""
<h3>
{% if latitude and longitude %}
<a href="javascript:js_maps.map.panTo(new GLatLng({{ latitude }}, {{ longitude }}))">*</a>
{% endif %}
<a href="#{{ IncidentNumber }}">{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}</a>
</h3>
""";

#
#	Body for the Google Maps pop and the sidebar
#	- used in: map from:"fire.GetGeocodedIncidents" to:"incidents"
#
set to:"fitem.body" value:"""
<p>
Alarm Level: {{ AlarmLevel }}
<br />
Incident Type: {{ IncidentType }}
<br />
City: {{ City }}
<br />
Street: {{ Street }} ({{ CrossStreet }})
<br />
Units: {{ Units }}
</p>
""";

#
#	Convert all the incidents from the fire module
#	to the path 'incidents' using the mapping rules defined above
#
#	- incidents are used in "gmaps.js" and "gmaps.html"
#
map from:"fire.GetGeocodedIncidents" to:"incidents" map:{
	"latitude" : "{{ latitude }}",
	"longitude" : "{{ longitude }}",
	"title" : "{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}",
	"uri" : "{{ HOME_URI }}#{{ IncidentNumber }}",
	"body" : "{{ *fitem.head.map|safe }}{{ *fitem.body|safe }}",
	"body_sb" : "{{ *fitem.head.sb|safe }}{{ *fitem.body|safe }}",
	"IncidentNumber" : "{{ IncidentNumber }}"
};

#
#	Load the 'gmaps' templates (for arbitrary geo-mapping),
#	using the 'incidents' for its items and the specified meta.
#
#	- used in in "gmaps.js" and "gmaps.html"
#
set to:"map_meta" value_render:true value:{
	"id" : "maps",
	"latitude" : 43.67,
	"longitude" : -79.38,
	"uzoom" : -13,
	"gzoom" : 13,
	"api_key" : "{{ cfg.gmaps.api_key|otherwise:'ABQIAAA...pIxzZQ' }}",
	"html" : {
		"width" : "1024px",
		"height" : "800px"
	}
};

#
#	Produce GMaps
#
template script:"gmaps" items:"incidents" meta:"map_meta";

#
#	Produce the final output
#
template value:"""
<html>
<head>
    <link rel="stylesheet" type="text/css" href="css.css" />
	{{ gmaps.js|safe }}
</head>
<body>
<div id="content_wrapper">
	<div id="map_wrapper">
		{{ gmaps.html|safe }}
	</div>
	<div id="text_wrapper">
{% for incident in incidents %}
	<div id="{{ incident.IncidentNumber }}">
		{{ incident.body_sb|safe }}
	</div>
{% endfor %}
</div>
</body>
</html>
""";

The gmaps.jd (imported in the second last directive) looks like as follows (there will not be a test). It’s designed to be a universal “show a map and plot points on in it” inclusion. I’ve added a few line breaks so the PRE box doesn’t break.

#
#
#
template to:"html" value:"""
<div id="id_{{ meta.id|jslug }}"
style="{% if meta.html.width %}width: {{ meta.html.width }};{% endif %}
{% if meta.html.height %} height: {{ meta.html.height }};{% endif %}
{% if meta.html.style %} style: {{ meta.html.style }};{% endif %}"
{% if meta.html.class %} class="{{ meta.html.class }}"{% endif %}
></div>
<script type="text/javascript">
js_{{ meta.id|jslug }}.onload();
</script>
""";

#
#
#
template to:"js" value:"""
<script
 type="text/javascript"
 src="http://maps.google.com/maps?file=api&v=2&key={{meta.api_key}}">
</script>
<script type="text/javascript">
js_{{ meta.id|jslug }} = {
 onload : function() {
  js_{{ meta.id|jslug }}.map = new GMap2(document.getElementById("id_{{ meta.id|jslug }}"));
  m = js_{{ meta.id|jslug }}.map;
  m.setCenter(new GLatLng({{ meta.latitude }}, {{ meta.longitude }}), {{ meta.gzoom }});

  // {{ items|length }} items follow
{% for itemd in items %}
{% if itemd.latitude and itemd.longitude %}

  // {{ itemd.title }}
  var ll = new GLatLng({{ itemd.latitude }}, {{ itemd.longitude }});
  var marker = js_{{ meta.id|jslug }}.make_marker(m, ll, "{{ itemd.body|safe|escapejs }}");
  m.addOverlay(marker);
{% else %}
  // an item is missing latitude or longitude
{% endif %}
{% endfor %}
 },

 make_marker : function(m, ll, html) {
  var marker = new GMarker(ll);
  GEvent.addListener(marker, "click", function() {
   m.openInfoWindowHtml(ll, html);
  });

  return marker;
 },

 end : 0
}
</script>
""";

December 18, 2008

Pipe Cleaner

demo,djolt,dqt,html / javascript,ideas,jd,maps,pipe cleaner,pybm,work · David Janes · 6:38 pm ·

I’ve been working (in my decreasing available spare time) on a project to pull together into a project called “Pipe Cleaner” all the various concepts I’ve been mentioning on this blog: Web Object Records (WORK) for API Access and object manipulation, Djolt for generating text from templates, Data/Query/Transform/Template (DQT) for transforming data and JD for scripting these elements together. The pieces came together this morning enough to put a demo together and here it is – the Toronto Fires Pt II Demo.

How, you may ask, does this differ from the original Toronto Fires Demo? The answer is how it is put together, which we describe here.

Index.dj

This is the Djolt template that generates the output. The data fed to this template is generate by the JD script, described in the next section.

<html>
<head>
    <link rel="stylesheet" type="text/css" href="css.css" />
    {{ gmaps.js|safe }}
</head>
<body>
<div id="content_wrapper">
    <div id="map_wrapper">
        {{ gmaps.html|safe }}
    </div>
    <div id="text_wrapper">
{% for incident in incidents %}
    <div id="{{ incident.IncidentNumber }}">
        {{ incident.body_sb|safe }}
    </div>
{% endfor %}
</div>
</body>
</html>

Quite simple … as you can see, most of the data is being pulled in from elsewhere. The elsewhere is provided by the script described in the next section.

Index.jd

This is the script that pull all the pieces together. Note that I’m not 100% happy with the way the data is imported, I would like the geocoding to become part of this data flow too. In the next release perhaps.

First we pull in the “fire” module that we wrote in the previous Map examples. This is doing exactly what you think: importing a Python module. We may have to increase the security or restrict this to working with an API for general purpose use.

import module:"fire";

Next we define two headers – one that is going to appear in the Google Maps popup, the next that is going to appear in the sidebar. They need to be different as they refer to themselves. Note that the sidebar header “breaks” the encapsulation of Google Maps – this seems to be unavoidable. The to:"fitem.head.map" and to:"fitem.head.sb" are manipulating a WORK dictionary to store values.

Note also here that we’ve extended JD to accept Python multiline strings – this was unavoidable if JD was to be useful to me.

set to:"fitem.head.map" value:"""
<h3>
<a href="#{{ IncidentNumber }}">{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}
</h3>
""";

set to:"fitem.head.sb" value:"""
<h3>
{% if latitude and longitude %}
<a href="javascript:js_maps.map.panTo(new GLatLng({{ latitude }}, {{ longitude }}))">*
{% endif %}
<a href="#{{ IncidentNumber }}">{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}
</h3>
""";

The next block defines the text of the body used to describe a fire incident. It follows much the same pattern as the previous block.

set to:"fitem.body" value:"""
<p>
Alarm Level: {{ AlarmLevel }}
<br />
Incident Type: {{ IncidentType }}
<br />
City: {{ City }}
<br />
Street: {{ Street }} ({{ CrossStreet }})
<br />
Units: {{ Units }}
</p>
""";

This is a map: it is translating the values in fire.GetGeocodeIncidents into a new format and storing that in incidents. The format that we were are storing it in is understood by the Google Maps generating module.

We may rename this translate, as the word map is somewhat overloaded.

map from:"fire.GetGeocodedIncidents" to:"incidents" map:{
    "latitude" : "{{ latitude }}",
    "longitude" : "{{ longitude }}",
    "title" : "{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}",
    "uri" : "{{ HOME_URI }}#{{ IncidentNumber }}",
    "body" : "{{ *fitem.head.map|safe }}{{ *fitem.body|safe }}",
    "body_sb" : "{{ *fitem.head.sb|safe }}{{ *fitem.body|safe }}",
    "IncidentNumber" : "{{ IncidentNumber }}"
};

Next we set up the “meta” (see WORK meta description if you’re not following along) for the maps. The render_value:true declaration makes PC interpret the templates in strings). We then call our Google Maps generating code (which are actually more Pipe Cleaners) and that gets fed to the Djolt template we first showed you. Clear? Maybe not, we’ll have more examples coming…

set to:"map_meta" render_value:true value:{
    "id" : "maps",
    "latitude" : 43.67,
    "longitude" : -79.38,
    "uzoom" : -13,
    "gzoom" : 13,
    "api_key" : "{{ cfg.gmaps.api_key|otherwise:'...mykey...' }}",
    "html" : {
        "width" : "1024px",
        "height" : "800px"
    }
};

load template:"gmaps.js" items:"incidents" meta:"map_meta";
load template:"gmaps.html" items:"incidents" meta:"map_meta";

Powered by WordPress

Switch to our mobile site