David Janes' Code Weblog

January 25, 2009

ChangeCamp – notes from redesigning www.toronto.ca/opendata

ideas · David Janes · 5:13 pm ·

This is a transcription-with-license from the a session I attended yesterday at ChangeCamp on entitled “Designing www.toronto.ca/opendata“. You can read more about ChangeCamp here; my primary reason for attending was interest in promoting and helping our governments share information which they are gathering, to make them more transparent, accountable and even potentially useful. This session I was attending was run by some folks from the City of Toronto and its 311 initiative (I think). These notes are based from memory and the photos I took, here on Flickr. Note that this is only one of several (3?) parallel discussions that were happening during this session, and this is only the “data” section of the session; I’m afraid I wandered off during the “tools” parts as I thought it might be a little premature.

Tags, Geocoding, Ontologies

A common theme of discussion was providing ways to organize the data. In particular:

  • geocode,
  • geocode by tag: e.g. “the annex”
  • tagged by topic; i.e. that this is an orthogonal axis to geocoding
  • metadata
  • we should look to how other cities have organized data to find a common vocabulary

A common concept was that we could use tags / geocodes to:

  • spontaneously form communities around ideas
  • track related issues
  • form feeds on related information
  • use to query related information that would generally be spread widely

Information Dissemantion

  • be able to track issues through the system; this is a very common theme
  • have access to historical information
  • feeds for everything

Crime and Public Safety

  • Police reports (geocoded)
  • Public health services
  • Emergency information
    • Info (like about SARS)
    • Releases
    • Public health

Scheduling

  • Pools
  • Skating rinks
  • Ferries
  • Public meetings

This is related to the concept discussed that any information that goes into a PDF should be available in raw form.

Politician Information

  • Voting records
  • Expenses
  • Finances

Service Information

  • Power grid disruptions
  • Critical incident
  • Zoning (i.e. what’s the zoning information for this location)
  • Density / population / demographics
  • Parking information (e.g. what’s the parking rules; how do I get a parking permit, etc.)
  • Real-time polution:
    • Water quality
    • Air quality
    • Tagged, available as feeds
    • Historic information
  • Traffic
    • Roads
      • street conditions
    • Trains
    • Utilization rates
  • Sewer / waterflow data; i.e. that apparently sensors are already in place for

Complaints

  • be able to add data into the system
  • be able to track that information

Tourism Information

  • event dates, locations, price; e.g. Nuit Blanche
  • standard information for tour operators

Budget Information

  • spreadsheets
  • all the raw data in PDFs should be available as XLS/CSV
  • be able trace evolution of data from its source; follow back up the chain

Tendering Information

  • What is up for tender
  • What tenders have been awarded
  • Make interaction with city more efficient and open

311 Information

  • Track whether services were successful
  • Raw feeds
  • Ticketing system (i.e. issue tracking)
  • Turnaround time

Community Group Information

Information to empower groups, enable spontaneous community formation…

  • Mayor’s initiatives
  • Bike lane’s
  • Deal with language issues
  • Schools
    • What assets are available (pools, gyms)
  • Parks & rec
    • Open spaces
  • Comunity centres
  • Commity health centres

My issues with OAuth

authentication,ideas · David Janes · 2:44 pm ·

The other day I twittered Chris Messina about OAuth:

@factoryjoe #OAuth is an incomprehensible mess. Programming a Python client to connect to a service has never been so hard

This is the details of my experience, plus suggestions about how to fix the problems I’ve encountered.

The username/password gold standard

To interact with a service like Twitter’s API, you need three pieces of information: a username,  password and an API endpoint I want to use. Once I have this information I can use a standard library in almost any language to start using the Twitter API – I am up and running within 45 seconds. Now, don’t mistake that I think giving up your username and password to a third party service is a good idea: it’s horrible. However, the other part – to be up and running within a few minutes – is critical from a programming usability point of view. No bucks, no Buck Rogers; no API users, no API usage.

It took me a day and half (albeit of scattered hours) to get OAuth to work for me. To put this in perspective, I had Google’s authentication system – including recoding urllib2 to deal with PUT and 301/302 errors – usable in about 2 hours.

What I’ve discovered is that OAuth is as almost as easy to use as HTTP Basic Authentication (the username / password scheme above); the issue is the confusing way OAuth is currently presenting information to developers. I have documented my coding experiences here and the code is freely available for use and perusal here (though you’ll really be buying into a web resources model that might not be your thing if you do).

The informational issues I had – with suggested fixes – are documented below.

Critical OAuth information is poorly packaged

This is what you need to know to access an OAuth API:

  • A “consumer key”
  • A “consumer secret”
  • An “authorization URL”
  • A “token URL”
  • An “access token URL”

Not to mention a list of API URLs that are the API end points. Note that all of these items are defined by terminology unique to OAuth and thus unfamiliar to the new developer. Now, try going to Fire Eagle and getting all of that information, and when you’re finished that head on over to PostRank to do the same. If you’re clever, it’ll take you 5 minutes but more likely (especially if you’ve never seen OAuth before) it’ll take you about 10 minutes and you’re likely to have got something wrong. Did you catch that Fire Eagle has similar looking but not quite the same URLs? Does PostRank use “http://” or “https://” for “standard request paths“? Did you know that PostRank also has two entirely different hostnames for URLs in its APIs? If you didn’t, well, you’ll probably be revisiting your lists.

Here’s what I suggest: OAuth should recommend that every OAuth Service Provider return the following JSON dictionary (in a TEXTAREA or PRE) in the place where they’re currently returning the consumer key & secret:

{
    "api_uri": [
        "http://www.postrank.com/myfeeds/",
        "http://www.postrank.com/user/",
        "http://api.postrank.com/"
    ],
    "oauth_access_token_url": "http://www.postrank.com/oauth/access_token",
    "oauth_authorization_url": "http://www.postrank.com/oauth/authorize",
    "oauth_token_url": "http://www.postrank.com/oauth/request_token",
    "oauth_consumer_key": "XXXXXXXXXXXXXXXXXXXXXX",
    "oauth_consumer_secret": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    "oauth_signature_method": "HMAC-SHA1",
    "oauth_version": "1.0"
}

That’s it: one single piece of information that can be read without difficultly by every single language modern developers use and can be copied and pasted in a single operation by the developer.

I’d almost like to note that Fire Eagle has the most useful OAuth developer documentation I’ve seen and the OAuth team should consider adopting it wholesale as their own.

The OAuth website is confusing

The front page of the OAuth website is very promising. There’s a big round button like area that says “For Consumer developers…” and another that says “For Service Provider developers…”. From there things go rapidly downhill. Neither of these items are buttons. Instead, one clicks on the “Get Started…” link, from there you examine a list of other links and then start reading about how OAuth got it’s name after the sound Blaine Cook’s college roommate’s brother’s cat used to make horking up furballs or whatever. Honestly: no one cares at this stage. What I’d like to do is click on “Consumer developers” link and start seeing a concrete example of what I need to do interact with an OAuth enable service. All the other stuff is filler.

One further point: it’d be nice to see a proper logo.

The OAuth website needs a API playground

My final recommendation is that OAuth provide on their website a live sample API Service Provider than all the libraries interact with out of the box. It’s difficult enough to get an API working without wondering whether the problem is on my side or on their side.

January 9, 2009

Thinking about Configuration

ideas,python · David Janes · 7:20 am ·

Happy New Year, everyone. I’ve been busy at paying work recently, plus cleaning up and testing existing code I’ve been discussing here over the last few months. At work I’ve been developing in WebObjects, which though a lovely platform is not the way of the future so I’m not documenting many of my experiences here.

The applications I’ve been working on recently, Pipe Cleaner and GenX, need – like most applications – configuration. This will store information which can be safely exposed to the public, such as my Google Maps API key, and information that I need to keep private within the application, such as my Freebase username and password (cf. however the password anti-pattern). Furthermore, though the code I’m writing is in Python it is possible that the code that provides the UI will be written in another language, such as PHP inside of WordPress.

Given these considerations, here’s my design choices:

  • configuration files are stored as multiple individual files inside a directory (or directories)
  • configuration files are in JSON, and contain a dictionary of dictionaries (see below)
  • configuration files can be marked as private or public
  • the same logical configuration (say for Amazon, which has both public and private information) can be in a public and private file
  • the configuration is global, but is accessed through setter/getter properties
  • non-global versions of the configuration can be made

That all said, here’s what I’ve written. First, the setters and getters:

class Cfg:
    _cfg_private = {}
    _cfg_public = {}

    @apply
    def public():
        def fget(self):
            return  self._cfg_public

        return property(**locals())

    @apply
    def private():
        def fget(self):
            return  self._cfg_private

        return property(**locals())

As an aside, I’m not 100% sure about Python decorators and wonder if my favorite language is being turned into a C++ like mess.

Next, the ‘add’ function that adds information to the configuration ensuring private and public are handled correctly. Note that there can be multiple dictionaries inside of ‘d’, but ‘d’ is either all Public or not.

    def add(self, d):
        if type(d) != types.DictType:
            raise TypeError("only dictionaries can be added")

        if d.get('@Public'):
            #
            #   Public definitions never overwrite private definitions
            #
            for key, value in d.iteritems():
                if type(value) != types.DictType:
                    continue

                if not self._cfg_private.has_key(key):
                    self._cfg_private[key] = value

                self._cfg_public[key] = value
        else:
            self._cfg_private.update(d)

And finally the loader, which gets everything in a directory or one level down. Note the ‘exception’ parameter which makes me a bad person, but I don’t like code failing unless I tell it to.

    def load(self, path, exception = False, depth = 0):
        try:
            if os.path.isdir(path) and depth < 2:
                for file in os.listdir(path):
                    self.load(os.path.join(path, file))
            elif os.path.isfile(path):
                if path.endswith(".json"):
                    self.add(json.loads(bm_io.readfile(path)))

        except:
            if exception:
                raise

            Log("ignoring exception", exception = True, path = path)

And one more thing: make the global configuation:

cfg = Cfg()

Here’s how you use it:

import bm_cfg

# setup ... on a per-file or directory basis
for file in sys.argv[1:]:
    bm_cfg.cfg.load(file)

# use it
pprint.pprint({
    "private" : bm_cfg.cfg.private,
    "public" : bm_cfg.cfg.public,
}, width = 1)

Here’s what my configuration directory looks like:

$ pwd
/Users/davidjanes/Sites/pc/cfg
$ ls
amazon.json		freebase.json		praized.json
amazon.public.json	gmaps.json		yahoo.json

Here’s the (private) amazon.json:

{
    "amazon" : {
        "Locale" : "us",
        "AccessKeyID" : "0......",
        "AssociateTag" : "ona-20",
        "Private" : "Don't See"
    }
}

And here’s the (public) amazon.public.json:

{
    "@Public" : 1,
    "amazon" : {
        "Locale" : "us",
        "AccessKeyID" : "0......",
        "AssociateTag" : "ona-20"
    }
}

Note that if the private version of the Amazon file wasn’t available, the public version would also be in the private one. I.e. the private configuration basically is “everything” (noting possibly exceptions above in the code).

December 29, 2008

Interesting links from the last month

db,ideas,semantic web · David Janes · 9:29 am ·
  • Aspena web server for highly extensible Python-based publication, application, and hybrid websites. As a potential alternative to Python’s builtin HTTPServer. MIT license.
  • V8V8 is Google’s open source JavaScript engine; written in C++; can run standalone, or can be embedded into any C++ application. I am very excited by this, as allowing users to send code to the server to execute Javascript is an amazingly powerful idea. If anyone knows of a Python wrapper, let me know please. New BSD license.
  • KomodoEdit (a testimonial) – I am going to try this out, though vi/vim will always be my first love (JJ also has an article on using ctags).
  • Virtuoso - an innovative Universal Server platform that delivers an enterprise level Data Integration and Management solution for SQL, RDF, XML, Web Services, and Business Processes. There’s way to much bla bla bla in that sentence, but apparently this is really sweet at handling SPARQL/RDF triples. Kingsley Idehen writes extensively about this on his blog (e.g.).
  • Drizzlea database optimized for Cloud and Net applications. Way too early to commit to this yet. See The New MySQL Landscape for more interesting going ons.
  • AuthKitauthentication and authorization toolkit for WSGI applications and frameworks.
  • Geodjangoa world-class geographic web framework. Lots of great ideas and pointers to libraries in here, even if you’re not planning to use this itself.
  • Disco – an open-source implementation of the Map-Reduce framework for distributed computing. The Disco core is written in Erlang, a functional language that is designed for building robust fault-tolerant distributed applications. Users of Disco typically write jobs in Python, which makes it possible to express even complex algorithms or data processing tasks often only in tens of lines of code. Here’s a blog post about the same, with references to vs. Hadoop.
  • On (Python) packaging. Debating distutil, easy_install and pip.

December 18, 2008

Pipe Cleaner

demo,djolt,dqt,html / javascript,ideas,jd,maps,pipe cleaner,pybm,work · David Janes · 6:38 pm ·

I’ve been working (in my decreasing available spare time) on a project to pull together into a project called “Pipe Cleaner” all the various concepts I’ve been mentioning on this blog: Web Object Records (WORK) for API Access and object manipulation, Djolt for generating text from templates, Data/Query/Transform/Template (DQT) for transforming data and JD for scripting these elements together. The pieces came together this morning enough to put a demo together and here it is – the Toronto Fires Pt II Demo.

How, you may ask, does this differ from the original Toronto Fires Demo? The answer is how it is put together, which we describe here.

Index.dj

This is the Djolt template that generates the output. The data fed to this template is generate by the JD script, described in the next section.

<html>
<head>
    <link rel="stylesheet" type="text/css" href="css.css" />
    {{ gmaps.js|safe }}
</head>
<body>
<div id="content_wrapper">
    <div id="map_wrapper">
        {{ gmaps.html|safe }}
    </div>
    <div id="text_wrapper">
{% for incident in incidents %}
    <div id="{{ incident.IncidentNumber }}">
        {{ incident.body_sb|safe }}
    </div>
{% endfor %}
</div>
</body>
</html>

Quite simple … as you can see, most of the data is being pulled in from elsewhere. The elsewhere is provided by the script described in the next section.

Index.jd

This is the script that pull all the pieces together. Note that I’m not 100% happy with the way the data is imported, I would like the geocoding to become part of this data flow too. In the next release perhaps.

First we pull in the “fire” module that we wrote in the previous Map examples. This is doing exactly what you think: importing a Python module. We may have to increase the security or restrict this to working with an API for general purpose use.

import module:"fire";

Next we define two headers – one that is going to appear in the Google Maps popup, the next that is going to appear in the sidebar. They need to be different as they refer to themselves. Note that the sidebar header “breaks” the encapsulation of Google Maps – this seems to be unavoidable. The to:"fitem.head.map" and to:"fitem.head.sb" are manipulating a WORK dictionary to store values.

Note also here that we’ve extended JD to accept Python multiline strings – this was unavoidable if JD was to be useful to me.

set to:"fitem.head.map" value:"""
<h3>
<a href="#{{ IncidentNumber }}">{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}
</h3>
""";

set to:"fitem.head.sb" value:"""
<h3>
{% if latitude and longitude %}
<a href="javascript:js_maps.map.panTo(new GLatLng({{ latitude }}, {{ longitude }}))">*
{% endif %}
<a href="#{{ IncidentNumber }}">{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}
</h3>
""";

The next block defines the text of the body used to describe a fire incident. It follows much the same pattern as the previous block.

set to:"fitem.body" value:"""
<p>
Alarm Level: {{ AlarmLevel }}
<br />
Incident Type: {{ IncidentType }}
<br />
City: {{ City }}
<br />
Street: {{ Street }} ({{ CrossStreet }})
<br />
Units: {{ Units }}
</p>
""";

This is a map: it is translating the values in fire.GetGeocodeIncidents into a new format and storing that in incidents. The format that we were are storing it in is understood by the Google Maps generating module.

We may rename this translate, as the word map is somewhat overloaded.

map from:"fire.GetGeocodedIncidents" to:"incidents" map:{
    "latitude" : "{{ latitude }}",
    "longitude" : "{{ longitude }}",
    "title" : "{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}",
    "uri" : "{{ HOME_URI }}#{{ IncidentNumber }}",
    "body" : "{{ *fitem.head.map|safe }}{{ *fitem.body|safe }}",
    "body_sb" : "{{ *fitem.head.sb|safe }}{{ *fitem.body|safe }}",
    "IncidentNumber" : "{{ IncidentNumber }}"
};

Next we set up the “meta” (see WORK meta description if you’re not following along) for the maps. The render_value:true declaration makes PC interpret the templates in strings). We then call our Google Maps generating code (which are actually more Pipe Cleaners) and that gets fed to the Djolt template we first showed you. Clear? Maybe not, we’ll have more examples coming…

set to:"map_meta" render_value:true value:{
    "id" : "maps",
    "latitude" : 43.67,
    "longitude" : -79.38,
    "uzoom" : -13,
    "gzoom" : 13,
    "api_key" : "{{ cfg.gmaps.api_key|otherwise:'...mykey...' }}",
    "html" : {
        "width" : "1024px",
        "height" : "800px"
    }
};

load template:"gmaps.js" items:"incidents" meta:"map_meta";
load template:"gmaps.html" items:"incidents" meta:"map_meta";

December 12, 2008

JD – JSON Declaration Language

ideas,jd · David Janes · 5:55 pm ·

I’ve just added a new module called bm_jd to the pybm project. It implements a “little language” for declaring information, like a configuration file, when the details are all specified in JSON.

The language is very simple, consisting of semi-colon terminated statements; each statement having a command and zero or more arguments. Each argument may or may not have JSON data – if it does, it will be set off with a colon.

The BNF looks like this:

    <document> ::= <statement>*
    <statement> ::= <command> ( <word> | <word>:<json> )*
    <command>|<word> ::= [a-zA-Z0-9_]
    <json> ::= ... any valid JSON data ...

You can use the pybm JD parser in several ways:

  • implement a subclass of JDParser, defining CustomizeProduce; or
  • implement a subclass of DispatchJDParser, defining a call_<command> method for each command you plan to allow

In either case, you call a method FeedString to get the parser rolling.

There’s also a LogJDParser, which just dumps parsing results. Here’s an example of a JD document. Don’t worry about Djolt code in the JSON, that’s just text as far as this example is concerned:

read_template from:"fire_body" render:false;
map from:"fire.GetGeocodeIndidents" to:"incidents" map:{
    "latitude" : "{{ latitude }}",
    "longitude" : "{{ longitude }}",
    "title" : "{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}",
    "uri" : "{{ HOME_URI }}#{{ IncidentNumber }}",
    "body" : "{{ *fire_body|safe }}",
    "IncidentNumber" : "{{ IncidentNumber }}"
};
read_template from:"gmaps" items:"incidents" meta:{
    "id" : "maps",
    "latitude" : 43.67,
    "longitude" : -79.38,
    "uzoom" : -13,
    "gzoom" : 13,
    "api_key" : "{{ cfg.gmaps.api_key|otherwise:'_' }}",
    "html" : {
        "width" : "1024px",
        "height" : "800px"
    }
};

December 11, 2008

A brief survey of Yahoo Pipes as a DQT

demo,djolt,dqt,ideas,semantic web,work · David Janes · 7:19 am ·

MacFUSEYahoo Pipes is a visual editor of mashups, allowing you to take data from sources on the net, transform them in various interesting ways and output the result as Atom, RSS or JSON. The primary downside Pipes of course is that you’re totally dependent on Yahoo for the infrastructure: it runs at Yahoo pulling feeds that have to be accessable through the public Internet.

It’s easy to use Pipes: just go to this page and start working with the sample example Pipe. You’ll need a Yahoo login ID, but most of us have that anyway. I’ve created an example that uses Yahoo Pipes to feed a Djolt template which you can see here.

We can analyze Pipes in the terms of the DQT paradigm we’ve outlined in the previous post.

Data Sources and Queries

Sources and Queries are merged (quite logically) in the Pipes interface. You can read in depth documentation here.

  • Fetch CSV
  • Feed Autodiscovery – outputs syndication feeds found on a page (RSS feeds on a CBC page)
  • Fetch Feed
  • Fetch Page – will read a page and parse the contents with a reg
  • Fetch Site Feed – this is the logical combination of Fetch Feed and Fetch Autodiscovery
  • Flickr – find images by tag near a location (photos of cats in Toronto)
  • Google Base – look up information in Google Base
  • Item Builder – a way of building new items from existing items
  • Yahoo Local
  • Yahoo Search

Transforms

The operator documentation can be read here.

  • Count
  • Filter
  • Location Extractor – a geocoder that magically looks for locations
  • Loop
  • Regex
  • Rename
  • Reverse
  • Sort
  • Split
  • Sub-element – pulls a particular sub-element of an item and makes that the item. This is very much like WORK path manipulation
  • Tail
  • Truncate
  • Union
  • Unique
  • Web Service

Plus a number of specialized data services, for dealing with elements such as dates.

Templates

Pipes does not provide an arbitrary Djolt-like template producing HTML. Instead, they provide a number of pre-made code templates that output well known data types, including RSS, JSON and Atom (and some stranger choices, like PHP).

December 9, 2008

Introducing DQT – Data/Query/Transform/Template

dqt,ideas · David Janes · 4:05 pm ·

Data/Query/Transform/Template – DQT, dropping the final T – is a commonly used pattern for displaying data on a website. The elements of this pattern are:

  • a Data source, such a blog database, an e-mail store, the Internet as a whole, a MySQL, and in particular the results of an API call.
  • a Query, which is a way of selecting a particular subset or slice of the data (typically homogeneous)
  • Transform rules, which can make the data look different by renaming fields, enhancing data using tools such as geolocation, filtering records out, merging multiple data sources and so forth.
  • a Template, which is a way of converting to a useful end-user format, such as HTML, JSON or XML

In the particular context of what I’m writing about, we can assume that we’re manipulating WORK items – that is, an an API returns a “Meta” block of information and a stream of “Items”, each in turn which are WORK items also. By identifying common patterns of dynamic page construction, my hope is that we can simplify page and mashup creation.

You’ve seen this pattern many times. My plan is that be describing it properly, we can make it easier to do.

WordPress Blogs

  • The Data source is the MySQL table with blog posts, plus ancillary information pulled from other tables.
  • The Query is some combination of ( page number, post path, category, tag ). Not all combinations are legal obviously, but this is the information that can be encoded in a URL request. The Data source and the Query result a number of posts being made available for further processing
  • The Template is the PHP code that converts the individual database items into HTML for display

There is no Transform in this example. See it here ;-).

BTW: Don’t take this as a how-to guide for WordPress. I’m trying to look at this from a high-level conceptual point-of-view.

Google Mail

  • the Data source is all the e-mails in the Google database, probably billions or trillions of messages
  • the Query is some combination of ( userid, page number, search ). The userid is not encoded in the URL, it is known be
  • the Template is the Javascript code that

That’s a really high level view: in fact, Google Mail does this DQT twice: the first time around to select JSON or XML data to be transmitted to the user’s browser; the second time around to locally on the user’s browser select and display items.

Yahoo News

  • the Data source is Yahoo’s news database
  • the Query is a category, or not category at all (poorly encoded the URL, I may add)
  • the Transform groups news into like categories (play along with me here)
  • the Template is the Yahoo’s HTML generator, whatever that may be

See it here.

An RSS feed

The Data and the Query are substantially similar to the WordPress Blog example, but:

  • we Transform the fields into a format that can be understood by an RSS generator
  • the Template is a specialized object that converts WORK items into RSS entries, that is, we don’t (or shouldn’t) use a Djolt-like template to generate XML.

This is obviously a somewhat of a hypothetical example, but reflects my recent ideas about how machine readable data should be generated.

December 8, 2008

Coding backwards for simplicity

djolt,dqt,ideas,pybm,python,work · David Janes · 4:58 pm ·

I haven’t been posting as much as I like here for the last three weeks, not because of lack of ideas but because I haven’t been able to consolidate what I’ve been working on into a coherent thought. I’m trying to come up with a overreaching conceptual arch that covers WORK, Djolt and the various API interfaces I’ve been coded. Tentatively and horribly, I’m calling this Data/Query/Transform/Template right now though I’m expecting this to change.

The first demo of this … without further explanation … can be seen here. More details about what this is actually demonstrating (besides formatting this blog) will be forthcoming.

What I want to draw attention to in this post is how I coded this. What I’ve been doing for the last several weeks is coding backwards: I start with what I want the final code to look like and then figure out all the libraries, little languages and so forth that would be needed to code that. After several false starts, my conceptual logjam broke about a week ago and code started radically simplifying.

The ideal code, in my mind, is almost entirely static declarations: no loops, no if statements, no while statements, no goto-type statements (god help us). We simply specify how the parts are connected, and hope that we can abstract the complexity into the libraries that make this all happen. The code that you see below is actually post all my conceptualizing: I just wanted to write some code and since I had almost all the parts together it fell together quite nicely:

import bm_wsgi
import bm_io

import djolt
import api_feed

from bm_log import Log

class Application(bm_wsgi.SimpleWrapper):
    def __init__(self, *av, **ad):
        bm_wsgi.SimpleWrapper.__init__(self, *av, **ad)

    def CustomizeSetup(self):
        self.html_template_src = bm_io.readfile("index.dj")
        self.html_template = djolt.Template(self.html_template_src)

        self.context = djolt.Context()
        self.context["paramd"] = {
            "feed" : "http://feeds.feedburner.com/DavidJanesCode",
            "template" : """\
<ul>
{% for item in data.items %}
	<li><a href="{{ item.link }}">{{ item.title }}</a></li>
{% endfor %}
""",
        }
        self.context.Push()
        self.context["paramd"] = self.paramd
        self.context["data"] = api_feed.RSS20(self.context.as_string("paramd.feed"))

    def CustomizeContent(self):
        yield   self.html_template.Render(self.context)

if __name__ == '__main__':
    Application.RunCGI()

There’s almost nothing there! In particular, note:

  • bm_wsgi.SimpleWrapper handles all the WSGI interface work, including determining when to output HTML headers, error trapping, and Unicode to UTF-8 encoding
  • the most complicated part of the application is setting up the Context. In particular, note that self.paramd is automatically populated by the QUERY_STRING passed to the application, and the double setting we do here allows us to have default values.
  • If you want to see the HTML template that drives the application it is here. Note two variations from Django templates: the {% asis %} block which doesn’t intrepret it’s content as Djolt code and the {{ *paramd.template|safe }} variable which interprets the variable’s contents as a template.
  • Methods called Customize-something are my convention for framework functions, i.e. methods that will be called for us rather than methods we call.

How to JSON encode iterators

ideas,python · David Janes · 2:32 pm ·

As part of my recent explorations, I’ve been playing a lot with Python iterators/generators. The key efficiency of iterators is that when working with lengthy list-like objects, you need only create the part that’s being looked at. It’s just-in-time objects.

If you attempt to JSON serialize an object with an iterator/generator object in it, the json module throws a cog: it doesn’t know how to serialize these types of objects. The json module is extensible and the documentation makes a suggestion how to do this:

class IterEncoder(json.JSONEncoder):
 def default(self, o):
   try:
       iterable = iter(o)
   except TypeError:
       pass
   else:
       return list(iterable)
   return JSONEncoder.default(self, o)

print json.dumps(xrange(4), cls = IterEncoder)

This seems somewhat ugly to me. In particular, lots of objects can be wrapped by the iter function that don’t need to be, plus lots of objects will cause that TypeError to be thrown which seems to be rather a bit of waste. Here’s the solution I came up with:

class IterEncoder(json.JSONEncoder):
    def default(self, o):
        try:
            return  json.JSONEncoder.default(self, o)
        except TypeError, x:
            try:
                return  list(o)
            except:
                return  x

This tries to encode the object the normal way. Only if that doesn’t work do we try to turn the object into a list. If that’s not convertible (i.e. the list object constructor fails) we go back and throw the original exception provided by JSONEncoder – we’ve really failed.

You use this as follows:

class X:
    def Iter(self):
        yield 1
        yield 2
        yield 3
        yield 4

xi = X().Iter()

print json.dumps(xi, cls = IterEncoder)
print json.dumps(xrange(4), cls = IterEncoder)

Which yields the expected:

[1, 2, 3, 4]
[0, 1, 2, 3]

Don’t be overly tempted to check the type of o: it may be types.GeneratorType or types.XRangeType or perhaps even something else that I haven’t found out yet.

« Newer Posts · Older Posts »

Powered by WordPress