David Janes' Code Weblog

November 20, 2009

How to use XCode for Android Projects

android, code fragments, ideas, macintosh · David Janes · 6:07 pm ·

Let’s assume you already have an Android project on your Mac.

Create the XCode Project

  • start XCode
  • select File > New Project…
  • select External Build System
  • go to the parent directory of your Android Project
  • in the Save As: field, enter the directory name of your Android Project
  • select the scarily-misnamed Replace option

Add Files

In your new XCode project:

  • select first item in the left hand column, which is the name of your project
  • right-click, select Add > Existing Files…
  • select add files (don’t select the Copy option)
  • organize as desired (I like to do a lot of grouping). You should be probably adding at least your Java files and your Layout resources.

Configure your Build Target

In your new XCode project:

  • look for Targets
  • inside will be a target for your project’s name
  • double click on it
    • change Build Tool to ant
    • change Arguments to install

Clicking ⌘B should now compile your project.

Note: if you figure out how to have a Build vs. Build & Install (e.g. ⌘ENTER) please let me know!.

Getting XCode to Recognize Java errors

  • Reconfigure the Build Target, changing ant to ./xant
  • Make a file xant in the project’s home directory, using the code below
  • do (from a Terminal) chmod a+x xant
#!/usr/bin/env python

import sys
import re
import subprocess

av = list(sys.argv)
av[0] = "ant"

p = subprocess.Popen(av, stdout = subprocess.PIPE)

javac_rex = re.compile(" +[[]javac[]] +")
line_rex = re.compile("[.]java:[\d]+:")

pending = ""
while True:
    d = p.stdout.read(128)
    if not d:
        break

    d = pending + d

    nx = d.rfind('\n')
    if nx == -1:
        pending = d
        continue
    else:
        d, pending = d[:nx + 1], d[nx + 1:]

        d = javac_rex.sub("", d)
        d = line_rex.sub(r"\g<0> error: ", d)
        sys.stdout.write(d)
        sys.stdout.flush()

sys.stdout.write(pending)
p.wait()
sys.exit(p.returncode)

Note: this code has been updated from the original post. It now reads little chunks and outputs them immediately rather than post-processing the ant output.

November 14, 2009

Twitter uses WOEIDs – why you should care

ideas, maps · David Janes · 5:43 am ·

Because I don’t think this announcement got sufficient attention: Twitter is using Yahoo’s WOEIDs to identify locations in its Geolocation API.

WOEIDs are:

  • a numeric identifier for geographical places or regions, from a “place of interest” all the way up to continent
  • uniquely defined, for each place
  • hierarchically nested
  • related to each other by natural concepts, such as neighbor-of, sibling-of, etc.
  • managed, by Yahoo
  • a language independent way of talking about place
  • not necessarily tied to political boundaries, i.e. there’s a WOEID for the Bay Area

This announcement is important because:

  • it orders tweets into “containers” that can be used to find those tweets easily. In particular, there’s now codes that a machine can easily work with to find human concepts. Previous attempts at identification depended upon things such user entered hash tags for things such as a nearby airport, postal code, zip code, etc. and other concepts that don’t necessarily reflect what’s really going on
  • it encourages others to start building on WOEIDs, a non-Google way of identifying places

Read more:

Note that I have some concerns about how well WOEIDs will work when we start wanting them being dynamically/crowd defined down to the business level (e.g. sort of like what foursquare is doing; I don’t think they use WOEIDs though.)

November 8, 2009

Next Generation iPhone to include RFID reader?

ideas, iphone · David Janes · 5:38 am ·

This was something I was hoping would be in the 3GS. Rumor: Next generation iPhone to be RFID enabled:

A highly reliable source has informed me that Apple has built some prototypes of the next gen iPhone with an RFID reader built in and they have seen it in action. So its not full NFC but its a start for real service discovery and I’m told that the reaction was very positive that we can expect this in the next gen iPhone.

If Apple does it, expect every phone manufacturer and their sister to begin pumping out NFC enabled phones, at least for service discovery and sync.

This just reinforces what we knew based on the two separate patents Apple submitted that had the iPhone enabled to read RFID tags. I’m told that the touch project video and the BT SIG’s specs were all driving forces to push this forward as well as other factors.

Guess I’ll be touching my iPhone to my Mac to link them together to sync iTunes by next year.

My thoughts:

  • this will be a game changer for RFID, bringing applications down to the small business and hobbyist layer
  • RFID + applications is a wicked combination, making the mobile phone an Anything Device; Microsoft, Motorola and RIM should all try to get the jump on this
  • RFID + in app purchase is a wicked combination; put your thinking cap on for this one
  • RFID + Augmented Reality makes context sensitive layers start popping up
  • I’m not a game person, but you know there’s got to be gaming implications for this
  • what are the implications for big chains — groceries, consumer electronics — when anyone can walk in the door and get a better price buy waving their phone at a shelf

August 10, 2009

Travel Websites & Web 3.0

Discover Anywhere Mobile, ideas, semantic web · David Janes · 2:53 pm ·

On my Discover Anywhere Mobile blog, I’ve posted a list of recommendations about how travel websites can use information to extend their reach.

March 27, 2009

The Three20 Project: Photo viewer and more for the iPhone

ideas, iphone · David Janes · 4:59 am ·

This code could be very useful for iPhone developers:

The name of the new project is Three20, after the 320-pixel-wide screen of the iPhone. The code is all hosted on github for your cloning pleasure. There is an excellent sample app called TTCatalog which lets you play with all of the various UI components. Documentation? Well… there are instructions for how to add Three20 to your project, but I am still working on comprehensive documentation for each of the classes. For now, the sample app and the code itself are your documentation.

The projects are:

  • Photo viewer
  • Message composer
  • Web image views
  • Internet-aware table view
  • Better text fields (including type-ahead)
  • HTTP disk cache
  • URL-based navigation (this could be interesting)

The source base is under the Apache license.

March 1, 2009

AUAPI: JSON to XML serialization

auapi, ideas, tips · David Janes · 8:40 am ·

Here is a brief outline of how one would “naively” transform Almost Universal API’s (AUAPI) JSON into XML. We say “naive” because in general one wants to make a transformation into a specific XML application: Atom, RSS, OPML, KML, etc.. In those cases, one has to rename and rework certain elements first for standards compliance, then complete the naive transformation for remaining elements.

Walking

Walking JSON objects is done depth first. Most of the complexity involved is in handling dictionaries, which can be valued as being comprised of ( key, value ) pairs. For each dictionary, we are creating an XML node whose properties are defined as follows:

  • keys beginning with @@ are ignored
  • the key @ means “the text” of the  node (the examples will make this more clear)
  • other keys beginning with @ are attributes of the node
  • all other keys are defining children of the node

There are number of complexities that have to be addressed; for this I suggest looking at the examples or source code.

Namespace handling

  • collect all the namespaces used in the JSON and add to the root XML node
  • if any JSON element has a namespace, assume that namespace is inherited by its children

Code

You can see the code for this in the AUAPI source base in api.py in XMLAPIWriter.TranscribeNode.

Example 1

{
    "numbers" : [ 1, -0.23, ],
    "strings" : [ "bob", "caf\xe9", ],
    "booleanish" : [ True, False, None, ],
}
<root>
    <numbers>1</numbers>
    <numbers>-0.23</numbers>
    <booleanish>True</booleanish>
    <booleanish>False</booleanish>
    <booleanish />
    <strings>bob</strings>
    <strings>caf\xc3\xa9</strings>
</root>

Example 2

{
    "a1" : {
        "b1" : 1,
        "b2" : 2,
    },
    "a2" : {
        "b3" : "hi",
        "b4" : "there",
    },
}
<root>
    <a1>
        <b1>1</b1>
        <b2>2</b2>
    </a1>
    <a2>
        <b4>there</b4>
        <b3>hi</b3>
    </a2>
</root>

Example 3

{
    "@attribute" : "hello",
    "@bttribute" : "there",
    "a" : "some string",
},
<root attribute="hello" bttribute="there">
    <a>some string</a>
</root>

February 22, 2009

What is the framework for public APIs?

ideas, semantic web · David Janes · 3:55 pm ·

This post was originally sent to the ChangeCamp mailing list in response to a question about “what framework should we use for public APIs?“.

The core “frameworks” are POSH, REST and JSON. POSH is “Plain Old Semantic HTML”, meaning websites should be developed using modern web standards, pages should validate and use HTML elements correctly, and presentation is coded using CSS. REST can have deeper implications, but amongst the simplest is that pages can be returned using simple GET statements against well known URLs. JSON has emerged as the defacto standard for returning API results, amongst the reasons for is simplicity of creating mashups and embedability.

Atom and/or RSS provide the framework for update notifications. There are emerging technologies for real-time delivery, but it’s too early to worry about that.

Microformats provide a framework for embedding well-understood objects in HTML, are based on popular and well-understood standards, are easy(-ish) to implement, and a “consumer” ecosystem exists. In particular, people can be represented by hCard, events by hCalendar, tagged data by rel-tag and microcontent (articles within a page) by hAtom. Note that no parallel infrastructure need exist to do microformats: they are served within HTML pages.

Identify should use OAuth and OpenID; pragmatism says Facebook Connect and Google Friend Connect should be in the mix too, though I have a number of reservations about those.

I am very non-bullish about RDF, particularly as a model for delivering data of well-defined formats. IMHO it has missed almost the entirely the mashup wave of the last few years, and successes seem to be scattered at best. RDFa is competing in microformat’s “space” and may see success yet if it starts proving concrete solutions rather than “here’s a format that can do anything”, especially given microformat’s process issues.

January 25, 2009

ChangeCamp – notes from redesigning www.toronto.ca/opendata

ideas · David Janes · 5:13 pm ·

This is a transcription-with-license from the a session I attended yesterday at ChangeCamp on entitled “Designing www.toronto.ca/opendata“. You can read more about ChangeCamp here; my primary reason for attending was interest in promoting and helping our governments share information which they are gathering, to make them more transparent, accountable and even potentially useful. This session I was attending was run by some folks from the City of Toronto and its 311 initiative (I think). These notes are based from memory and the photos I took, here on Flickr. Note that this is only one of several (3?) parallel discussions that were happening during this session, and this is only the “data” section of the session; I’m afraid I wandered off during the “tools” parts as I thought it might be a little premature.

Tags, Geocoding, Ontologies

A common theme of discussion was providing ways to organize the data. In particular:

  • geocode,
  • geocode by tag: e.g. “the annex”
  • tagged by topic; i.e. that this is an orthogonal axis to geocoding
  • metadata
  • we should look to how other cities have organized data to find a common vocabulary

A common concept was that we could use tags / geocodes to:

  • spontaneously form communities around ideas
  • track related issues
  • form feeds on related information
  • use to query related information that would generally be spread widely

Information Dissemantion

  • be able to track issues through the system; this is a very common theme
  • have access to historical information
  • feeds for everything

Crime and Public Safety

  • Police reports (geocoded)
  • Public health services
  • Emergency information
    • Info (like about SARS)
    • Releases
    • Public health

Scheduling

  • Pools
  • Skating rinks
  • Ferries
  • Public meetings

This is related to the concept discussed that any information that goes into a PDF should be available in raw form.

Politician Information

  • Voting records
  • Expenses
  • Finances

Service Information

  • Power grid disruptions
  • Critical incident
  • Zoning (i.e. what’s the zoning information for this location)
  • Density / population / demographics
  • Parking information (e.g. what’s the parking rules; how do I get a parking permit, etc.)
  • Real-time polution:
    • Water quality
    • Air quality
    • Tagged, available as feeds
    • Historic information
  • Traffic
    • Roads
      • street conditions
    • Trains
    • Utilization rates
  • Sewer / waterflow data; i.e. that apparently sensors are already in place for

Complaints

  • be able to add data into the system
  • be able to track that information

Tourism Information

  • event dates, locations, price; e.g. Nuit Blanche
  • standard information for tour operators

Budget Information

  • spreadsheets
  • all the raw data in PDFs should be available as XLS/CSV
  • be able trace evolution of data from its source; follow back up the chain

Tendering Information

  • What is up for tender
  • What tenders have been awarded
  • Make interaction with city more efficient and open

311 Information

  • Track whether services were successful
  • Raw feeds
  • Ticketing system (i.e. issue tracking)
  • Turnaround time

Community Group Information

Information to empower groups, enable spontaneous community formation…

  • Mayor’s initiatives
  • Bike lane’s
  • Deal with language issues
  • Schools
    • What assets are available (pools, gyms)
  • Parks & rec
    • Open spaces
  • Comunity centres
  • Commity health centres

My issues with OAuth

authentication, ideas · David Janes · 2:44 pm ·

The other day I twittered Chris Messina about OAuth:

@factoryjoe #OAuth is an incomprehensible mess. Programming a Python client to connect to a service has never been so hard

This is the details of my experience, plus suggestions about how to fix the problems I’ve encountered.

The username/password gold standard

To interact with a service like Twitter’s API, you need three pieces of information: a username,  password and an API endpoint I want to use. Once I have this information I can use a standard library in almost any language to start using the Twitter API – I am up and running within 45 seconds. Now, don’t mistake that I think giving up your username and password to a third party service is a good idea: it’s horrible. However, the other part – to be up and running within a few minutes – is critical from a programming usability point of view. No bucks, no Buck Rogers; no API users, no API usage.

It took me a day and half (albeit of scattered hours) to get OAuth to work for me. To put this in perspective, I had Google’s authentication system – including recoding urllib2 to deal with PUT and 301/302 errors – usable in about 2 hours.

What I’ve discovered is that OAuth is as almost as easy to use as HTTP Basic Authentication (the username / password scheme above); the issue is the confusing way OAuth is currently presenting information to developers. I have documented my coding experiences here and the code is freely available for use and perusal here (though you’ll really be buying into a web resources model that might not be your thing if you do).

The informational issues I had – with suggested fixes – are documented below.

Critical OAuth information is poorly packaged

This is what you need to know to access an OAuth API:

  • A “consumer key”
  • A “consumer secret”
  • An “authorization URL”
  • A “token URL”
  • An “access token URL”

Not to mention a list of API URLs that are the API end points. Note that all of these items are defined by terminology unique to OAuth and thus unfamiliar to the new developer. Now, try going to Fire Eagle and getting all of that information, and when you’re finished that head on over to PostRank to do the same. If you’re clever, it’ll take you 5 minutes but more likely (especially if you’ve never seen OAuth before) it’ll take you about 10 minutes and you’re likely to have got something wrong. Did you catch that Fire Eagle has similar looking but not quite the same URLs? Does PostRank use “http://” or “https://” for “standard request paths“? Did you know that PostRank also has two entirely different hostnames for URLs in its APIs? If you didn’t, well, you’ll probably be revisiting your lists.

Here’s what I suggest: OAuth should recommend that every OAuth Service Provider return the following JSON dictionary (in a TEXTAREA or PRE) in the place where they’re currently returning the consumer key & secret:

{
    "api_uri": [
        "http://www.postrank.com/myfeeds/",
        "http://www.postrank.com/user/",
        "http://api.postrank.com/"
    ],
    "oauth_access_token_url": "http://www.postrank.com/oauth/access_token",
    "oauth_authorization_url": "http://www.postrank.com/oauth/authorize",
    "oauth_token_url": "http://www.postrank.com/oauth/request_token",
    "oauth_consumer_key": "XXXXXXXXXXXXXXXXXXXXXX",
    "oauth_consumer_secret": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    "oauth_signature_method": "HMAC-SHA1",
    "oauth_version": "1.0"
}

That’s it: one single piece of information that can be read without difficultly by every single language modern developers use and can be copied and pasted in a single operation by the developer.

I’d almost like to note that Fire Eagle has the most useful OAuth developer documentation I’ve seen and the OAuth team should consider adopting it wholesale as their own.

The OAuth website is confusing

The front page of the OAuth website is very promising. There’s a big round button like area that says “For Consumer developers…” and another that says “For Service Provider developers…”. From there things go rapidly downhill. Neither of these items are buttons. Instead, one clicks on the “Get Started…” link, from there you examine a list of other links and then start reading about how OAuth got it’s name after the sound Blaine Cook’s college roommate’s brother’s cat used to make horking up furballs or whatever. Honestly: no one cares at this stage. What I’d like to do is click on “Consumer developers” link and start seeing a concrete example of what I need to do interact with an OAuth enable service. All the other stuff is filler.

One further point: it’d be nice to see a proper logo.

The OAuth website needs a API playground

My final recommendation is that OAuth provide on their website a live sample API Service Provider than all the libraries interact with out of the box. It’s difficult enough to get an API working without wondering whether the problem is on my side or on their side.

January 9, 2009

Thinking about Configuration

ideas, python · David Janes · 7:20 am ·

Happy New Year, everyone. I’ve been busy at paying work recently, plus cleaning up and testing existing code I’ve been discussing here over the last few months. At work I’ve been developing in WebObjects, which though a lovely platform is not the way of the future so I’m not documenting many of my experiences here.

The applications I’ve been working on recently, Pipe Cleaner and GenX, need – like most applications – configuration. This will store information which can be safely exposed to the public, such as my Google Maps API key, and information that I need to keep private within the application, such as my Freebase username and password (cf. however the password anti-pattern). Furthermore, though the code I’m writing is in Python it is possible that the code that provides the UI will be written in another language, such as PHP inside of WordPress.

Given these considerations, here’s my design choices:

  • configuration files are stored as multiple individual files inside a directory (or directories)
  • configuration files are in JSON, and contain a dictionary of dictionaries (see below)
  • configuration files can be marked as private or public
  • the same logical configuration (say for Amazon, which has both public and private information) can be in a public and private file
  • the configuration is global, but is accessed through setter/getter properties
  • non-global versions of the configuration can be made

That all said, here’s what I’ve written. First, the setters and getters:

class Cfg:
    _cfg_private = {}
    _cfg_public = {}

    @apply
    def public():
        def fget(self):
            return  self._cfg_public

        return property(**locals())

    @apply
    def private():
        def fget(self):
            return  self._cfg_private

        return property(**locals())

As an aside, I’m not 100% sure about Python decorators and wonder if my favorite language is being turned into a C++ like mess.

Next, the ‘add’ function that adds information to the configuration ensuring private and public are handled correctly. Note that there can be multiple dictionaries inside of ‘d’, but ‘d’ is either all Public or not.

    def add(self, d):
        if type(d) != types.DictType:
            raise TypeError("only dictionaries can be added")

        if d.get('@Public'):
            #
            #   Public definitions never overwrite private definitions
            #
            for key, value in d.iteritems():
                if type(value) != types.DictType:
                    continue

                if not self._cfg_private.has_key(key):
                    self._cfg_private[key] = value

                self._cfg_public[key] = value
        else:
            self._cfg_private.update(d)

And finally the loader, which gets everything in a directory or one level down. Note the ‘exception’ parameter which makes me a bad person, but I don’t like code failing unless I tell it to.

    def load(self, path, exception = False, depth = 0):
        try:
            if os.path.isdir(path) and depth < 2:
                for file in os.listdir(path):
                    self.load(os.path.join(path, file))
            elif os.path.isfile(path):
                if path.endswith(".json"):
                    self.add(json.loads(bm_io.readfile(path)))

        except:
            if exception:
                raise

            Log("ignoring exception", exception = True, path = path)

And one more thing: make the global configuation:

cfg = Cfg()

Here’s how you use it:

import bm_cfg

# setup ... on a per-file or directory basis
for file in sys.argv[1:]:
    bm_cfg.cfg.load(file)

# use it
pprint.pprint({
    "private" : bm_cfg.cfg.private,
    "public" : bm_cfg.cfg.public,
}, width = 1)

Here’s what my configuration directory looks like:

$ pwd
/Users/davidjanes/Sites/pc/cfg
$ ls
amazon.json		freebase.json		praized.json
amazon.public.json	gmaps.json		yahoo.json

Here’s the (private) amazon.json:

{
    "amazon" : {
        "Locale" : "us",
        "AccessKeyID" : "0......",
        "AssociateTag" : "ona-20",
        "Private" : "Don't See"
    }
}

And here’s the (public) amazon.public.json:

{
    "@Public" : 1,
    "amazon" : {
        "Locale" : "us",
        "AccessKeyID" : "0......",
        "AssociateTag" : "ona-20"
    }
}

Note that if the private version of the Amazon file wasn’t available, the public version would also be in the private one. I.e. the private configuration basically is “everything” (noting possibly exceptions above in the code).

December 29, 2008

Interesting links from the last month

db, ideas, semantic web · David Janes · 9:29 am ·
  • Aspena web server for highly extensible Python-based publication, application, and hybrid websites. As a potential alternative to Python’s builtin HTTPServer. MIT license.
  • V8V8 is Google’s open source JavaScript engine; written in C++; can run standalone, or can be embedded into any C++ application. I am very excited by this, as allowing users to send code to the server to execute Javascript is an amazingly powerful idea. If anyone knows of a Python wrapper, let me know please. New BSD license.
  • KomodoEdit (a testimonial) – I am going to try this out, though vi/vim will always be my first love (JJ also has an article on using ctags).
  • Virtuoso - an innovative Universal Server platform that delivers an enterprise level Data Integration and Management solution for SQL, RDF, XML, Web Services, and Business Processes. There’s way to much bla bla bla in that sentence, but apparently this is really sweet at handling SPARQL/RDF triples. Kingsley Idehen writes extensively about this on his blog (e.g.).
  • Drizzlea database optimized for Cloud and Net applications. Way too early to commit to this yet. See The New MySQL Landscape for more interesting going ons.
  • AuthKitauthentication and authorization toolkit for WSGI applications and frameworks.
  • Geodjangoa world-class geographic web framework. Lots of great ideas and pointers to libraries in here, even if you’re not planning to use this itself.
  • Disco – an open-source implementation of the Map-Reduce framework for distributed computing. The Disco core is written in Erlang, a functional language that is designed for building robust fault-tolerant distributed applications. Users of Disco typically write jobs in Python, which makes it possible to express even complex algorithms or data processing tasks often only in tens of lines of code. Here’s a blog post about the same, with references to vs. Hadoop.
  • On (Python) packaging. Debating distutil, easy_install and pip.

December 18, 2008

Pipe Cleaner

demo, djolt, dqt, html / javascript, ideas, jd, maps, pipe cleaner, pybm, work · David Janes · 6:38 pm ·

I’ve been working (in my decreasing available spare time) on a project to pull together into a project called “Pipe Cleaner” all the various concepts I’ve been mentioning on this blog: Web Object Records (WORK) for API Access and object manipulation, Djolt for generating text from templates, Data/Query/Transform/Template (DQT) for transforming data and JD for scripting these elements together. The pieces came together this morning enough to put a demo together and here it is – the Toronto Fires Pt II Demo.

How, you may ask, does this differ from the original Toronto Fires Demo? The answer is how it is put together, which we describe here.

Index.dj

This is the Djolt template that generates the output. The data fed to this template is generate by the JD script, described in the next section.

<html>
<head>
    <link rel="stylesheet" type="text/css" href="css.css" />
    {{ gmaps.js|safe }}
</head>
<body>
<div id="content_wrapper">
    <div id="map_wrapper">
        {{ gmaps.html|safe }}
    </div>
    <div id="text_wrapper">
{% for incident in incidents %}
    <div id="{{ incident.IncidentNumber }}">
        {{ incident.body_sb|safe }}
    </div>
{% endfor %}
</div>
</body>
</html>

Quite simple … as you can see, most of the data is being pulled in from elsewhere. The elsewhere is provided by the script described in the next section.

Index.jd

This is the script that pull all the pieces together. Note that I’m not 100% happy with the way the data is imported, I would like the geocoding to become part of this data flow too. In the next release perhaps.

First we pull in the “fire” module that we wrote in the previous Map examples. This is doing exactly what you think: importing a Python module. We may have to increase the security or restrict this to working with an API for general purpose use.

import module:"fire";

Next we define two headers – one that is going to appear in the Google Maps popup, the next that is going to appear in the sidebar. They need to be different as they refer to themselves. Note that the sidebar header “breaks” the encapsulation of Google Maps – this seems to be unavoidable. The to:"fitem.head.map" and to:"fitem.head.sb" are manipulating a WORK dictionary to store values.

Note also here that we’ve extended JD to accept Python multiline strings – this was unavoidable if JD was to be useful to me.

set to:"fitem.head.map" value:"""
<h3>
<a href="#{{ IncidentNumber }}">{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}
</h3>
""";

set to:"fitem.head.sb" value:"""
<h3>
{% if latitude and longitude %}
<a href="javascript:js_maps.map.panTo(new GLatLng({{ latitude }}, {{ longitude }}))">*
{% endif %}
<a href="#{{ IncidentNumber }}">{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}
</h3>
""";

The next block defines the text of the body used to describe a fire incident. It follows much the same pattern as the previous block.

set to:"fitem.body" value:"""
<p>
Alarm Level: {{ AlarmLevel }}
<br />
Incident Type: {{ IncidentType }}
<br />
City: {{ City }}
<br />
Street: {{ Street }} ({{ CrossStreet }})
<br />
Units: {{ Units }}
</p>
""";

This is a map: it is translating the values in fire.GetGeocodeIncidents into a new format and storing that in incidents. The format that we were are storing it in is understood by the Google Maps generating module.

We may rename this translate, as the word map is somewhat overloaded.

map from:"fire.GetGeocodedIncidents" to:"incidents" map:{
    "latitude" : "{{ latitude }}",
    "longitude" : "{{ longitude }}",
    "title" : "{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}",
    "uri" : "{{ HOME_URI }}#{{ IncidentNumber }}",
    "body" : "{{ *fitem.head.map|safe }}{{ *fitem.body|safe }}",
    "body_sb" : "{{ *fitem.head.sb|safe }}{{ *fitem.body|safe }}",
    "IncidentNumber" : "{{ IncidentNumber }}"
};

Next we set up the “meta” (see WORK meta description if you’re not following along) for the maps. The render_value:true declaration makes PC interpret the templates in strings). We then call our Google Maps generating code (which are actually more Pipe Cleaners) and that gets fed to the Djolt template we first showed you. Clear? Maybe not, we’ll have more examples coming…

set to:"map_meta" render_value:true value:{
    "id" : "maps",
    "latitude" : 43.67,
    "longitude" : -79.38,
    "uzoom" : -13,
    "gzoom" : 13,
    "api_key" : "{{ cfg.gmaps.api_key|otherwise:'...mykey...' }}",
    "html" : {
        "width" : "1024px",
        "height" : "800px"
    }
};

load template:"gmaps.js" items:"incidents" meta:"map_meta";
load template:"gmaps.html" items:"incidents" meta:"map_meta";

December 12, 2008

JD – JSON Declaration Language

ideas, jd · David Janes · 5:55 pm ·

I’ve just added a new module called bm_jd to the pybm project. It implements a “little language” for declaring information, like a configuration file, when the details are all specified in JSON.

The language is very simple, consisting of semi-colon terminated statements; each statement having a command and zero or more arguments. Each argument may or may not have JSON data – if it does, it will be set off with a colon.

The BNF looks like this:

    <document> ::= <statement>*
    <statement> ::= <command> ( <word> | <word>:<json> )*
    <command>|<word> ::= [a-zA-Z0-9_]
    <json> ::= ... any valid JSON data ...

You can use the pybm JD parser in several ways:

  • implement a subclass of JDParser, defining CustomizeProduce; or
  • implement a subclass of DispatchJDParser, defining a call_<command> method for each command you plan to allow

In either case, you call a method FeedString to get the parser rolling.

There’s also a LogJDParser, which just dumps parsing results. Here’s an example of a JD document. Don’t worry about Djolt code in the JSON, that’s just text as far as this example is concerned:

read_template from:"fire_body" render:false;
map from:"fire.GetGeocodeIndidents" to:"incidents" map:{
    "latitude" : "{{ latitude }}",
    "longitude" : "{{ longitude }}",
    "title" : "{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}",
    "uri" : "{{ HOME_URI }}#{{ IncidentNumber }}",
    "body" : "{{ *fire_body|safe }}",
    "IncidentNumber" : "{{ IncidentNumber }}"
};
read_template from:"gmaps" items:"incidents" meta:{
    "id" : "maps",
    "latitude" : 43.67,
    "longitude" : -79.38,
    "uzoom" : -13,
    "gzoom" : 13,
    "api_key" : "{{ cfg.gmaps.api_key|otherwise:'_' }}",
    "html" : {
        "width" : "1024px",
        "height" : "800px"
    }
};

December 11, 2008

A brief survey of Yahoo Pipes as a DQT

demo, djolt, dqt, ideas, semantic web, work · David Janes · 7:19 am ·

MacFUSEYahoo Pipes is a visual editor of mashups, allowing you to take data from sources on the net, transform them in various interesting ways and output the result as Atom, RSS or JSON. The primary downside Pipes of course is that you’re totally dependent on Yahoo for the infrastructure: it runs at Yahoo pulling feeds that have to be accessable through the public Internet.

It’s easy to use Pipes: just go to this page and start working with the sample example Pipe. You’ll need a Yahoo login ID, but most of us have that anyway. I’ve created an example that uses Yahoo Pipes to feed a Djolt template which you can see here.

We can analyze Pipes in the terms of the DQT paradigm we’ve outlined in the previous post.

Data Sources and Queries

Sources and Queries are merged (quite logically) in the Pipes interface. You can read in depth documentation here.

  • Fetch CSV
  • Feed Autodiscovery – outputs syndication feeds found on a page (RSS feeds on a CBC page)
  • Fetch Feed
  • Fetch Page – will read a page and parse the contents with a reg
  • Fetch Site Feed – this is the logical combination of Fetch Feed and Fetch Autodiscovery
  • Flickr – find images by tag near a location (photos of cats in Toronto)
  • Google Base – look up information in Google Base
  • Item Builder – a way of building new items from existing items
  • Yahoo Local
  • Yahoo Search

Transforms

The operator documentation can be read here.

  • Count
  • Filter
  • Location Extractor – a geocoder that magically looks for locations
  • Loop
  • Regex
  • Rename
  • Reverse
  • Sort
  • Split
  • Sub-element – pulls a particular sub-element of an item and makes that the item. This is very much like WORK path manipulation
  • Tail
  • Truncate
  • Union
  • Unique
  • Web Service

Plus a number of specialized data services, for dealing with elements such as dates.

Templates

Pipes does not provide an arbitrary Djolt-like template producing HTML. Instead, they provide a number of pre-made code templates that output well known data types, including RSS, JSON and Atom (and some stranger choices, like PHP).

December 9, 2008

Introducing DQT – Data/Query/Transform/Template

dqt, ideas · David Janes · 4:05 pm ·

Data/Query/Transform/Template – DQT, dropping the final T – is a commonly used pattern for displaying data on a website. The elements of this pattern are:

  • a Data source, such a blog database, an e-mail store, the Internet as a whole, a MySQL, and in particular the results of an API call.
  • a Query, which is a way of selecting a particular subset or slice of the data (typically homogeneous)
  • Transform rules, which can make the data look different by renaming fields, enhancing data using tools such as geolocation, filtering records out, merging multiple data sources and so forth.
  • a Template, which is a way of converting to a useful end-user format, such as HTML, JSON or XML

In the particular context of what I’m writing about, we can assume that we’re manipulating WORK items – that is, an an API returns a “Meta” block of information and a stream of “Items”, each in turn which are WORK items also. By identifying common patterns of dynamic page construction, my hope is that we can simplify page and mashup creation.

You’ve seen this pattern many times. My plan is that be describing it properly, we can make it easier to do.

Wordpress Blogs

  • The Data source is the MySQL table with blog posts, plus ancillary information pulled from other tables.
  • The Query is some combination of ( page number, post path, category, tag ). Not all combinations are legal obviously, but this is the information that can be encoded in a URL request. The Data source and the Query result a number of posts being made available for further processing
  • The Template is the PHP code that converts the individual database items into HTML for display

There is no Transform in this example. See it here ;-).

BTW: Don’t take this as a how-to guide for Wordpress. I’m trying to look at this from a high-level conceptual point-of-view.

Google Mail

  • the Data source is all the e-mails in the Google database, probably billions or trillions of messages
  • the Query is some combination of ( userid, page number, search ). The userid is not encoded in the URL, it is known be
  • the Template is the Javascript code that

That’s a really high level view: in fact, Google Mail does this DQT twice: the first time around to select JSON or XML data to be transmitted to the user’s browser; the second time around to locally on the user’s browser select and display items.

Yahoo News

  • the Data source is Yahoo’s news database
  • the Query is a category, or not category at all (poorly encoded the URL, I may add)
  • the Transform groups news into like categories (play along with me here)
  • the Template is the Yahoo’s HTML generator, whatever that may be

See it here.

An RSS feed

The Data and the Query are substantially similar to the Wordpress Blog example, but:

  • we Transform the fields into a format that can be understood by an RSS generator
  • the Template is a specialized object that converts WORK items into RSS entries, that is, we don’t (or shouldn’t) use a Djolt-like template to generate XML.

This is obviously a somewhat of a hypothetical example, but reflects my recent ideas about how machine readable data should be generated.

December 8, 2008

Coding backwards for simplicity

djolt, dqt, ideas, pybm, python, work · David Janes · 4:58 pm ·

I haven’t been posting as much as I like here for the last three weeks, not because of lack of ideas but because I haven’t been able to consolidate what I’ve been working on into a coherent thought. I’m trying to come up with a overreaching conceptual arch that covers WORK, Djolt and the various API interfaces I’ve been coded. Tentatively and horribly, I’m calling this Data/Query/Transform/Template right now though I’m expecting this to change.

The first demo of this … without further explanation … can be seen here. More details about what this is actually demonstrating (besides formatting this blog) will be forthcoming.

What I want to draw attention to in this post is how I coded this. What I’ve been doing for the last several weeks is coding backwards: I start with what I want the final code to look like and then figure out all the libraries, little languages and so forth that would be needed to code that. After several false starts, my conceptual logjam broke about a week ago and code started radically simplifying.

The ideal code, in my mind, is almost entirely static declarations: no loops, no if statements, no while statements, no goto-type statements (god help us). We simply specify how the parts are connected, and hope that we can abstract the complexity into the libraries that make this all happen. The code that you see below is actually post all my conceptualizing: I just wanted to write some code and since I had almost all the parts together it fell together quite nicely:

import bm_wsgi
import bm_io

import djolt
import api_feed

from bm_log import Log

class Application(bm_wsgi.SimpleWrapper):
    def __init__(self, *av, **ad):
        bm_wsgi.SimpleWrapper.__init__(self, *av, **ad)

    def CustomizeSetup(self):
        self.html_template_src = bm_io.readfile("index.dj")
        self.html_template = djolt.Template(self.html_template_src)

        self.context = djolt.Context()
        self.context["paramd"] = {
            "feed" : "http://feeds.feedburner.com/DavidJanesCode",
            "template" : """\
<ul>
{% for item in data.items %}
	<li><a href="{{ item.link }}">{{ item.title }}</a></li>
{% endfor %}
""",
        }
        self.context.Push()
        self.context["paramd"] = self.paramd
        self.context["data"] = api_feed.RSS20(self.context.as_string("paramd.feed"))

    def CustomizeContent(self):
        yield   self.html_template.Render(self.context)

if __name__ == '__main__':
    Application.RunCGI()

There’s almost nothing there! In particular, note:

  • bm_wsgi.SimpleWrapper handles all the WSGI interface work, including determining when to output HTML headers, error trapping, and Unicode to UTF-8 encoding
  • the most complicated part of the application is setting up the Context. In particular, note that self.paramd is automatically populated by the QUERY_STRING passed to the application, and the double setting we do here allows us to have default values.
  • If you want to see the HTML template that drives the application it is here. Note two variations from Django templates: the {% asis %} block which doesn’t intrepret it’s content as Djolt code and the {{ *paramd.template|safe }} variable which interprets the variable’s contents as a template.
  • Methods called Customize-something are my convention for framework functions, i.e. methods that will be called for us rather than methods we call.

How to JSON encode iterators

ideas, python · David Janes · 2:32 pm ·

As part of my recent explorations, I’ve been playing a lot with Python iterators/generators. The key efficiency of iterators is that when working with lengthy list-like objects, you need only create the part that’s being looked at. It’s just-in-time objects.

If you attempt to JSON serialize an object with an iterator/generator object in it, the json module throws a cog: it doesn’t know how to serialize these types of objects. The json module is extensible and the documentation makes a suggestion how to do this:

class IterEncoder(json.JSONEncoder):
 def default(self, o):
   try:
       iterable = iter(o)
   except TypeError:
       pass
   else:
       return list(iterable)
   return JSONEncoder.default(self, o)

print json.dumps(xrange(4), cls = IterEncoder)

This seems somewhat ugly to me. In particular, lots of objects can be wrapped by the iter function that don’t need to be, plus lots of objects will cause that TypeError to be thrown which seems to be rather a bit of waste. Here’s the solution I came up with:

class IterEncoder(json.JSONEncoder):
    def default(self, o):
        try:
            return  json.JSONEncoder.default(self, o)
        except TypeError, x:
            try:
                return  list(o)
            except:
                return  x

This tries to encode the object the normal way. Only if that doesn’t work do we try to turn the object into a list. If that’s not convertible (i.e. the list object constructor fails) we go back and throw the original exception provided by JSONEncoder – we’ve really failed.

You use this as follows:

class X:
    def Iter(self):
        yield 1
        yield 2
        yield 3
        yield 4

xi = X().Iter()

print json.dumps(xi, cls = IterEncoder)
print json.dumps(xrange(4), cls = IterEncoder)

Which yields the expected:

[1, 2, 3, 4]
[0, 1, 2, 3]

Don’t be overly tempted to check the type of o: it may be types.GeneratorType or types.XRangeType or perhaps even something else that I haven’t found out yet.

December 4, 2008

Djolt Indirection

demo, djolt, ideas, python · David Janes · 6:05 am ·

I’ve been working through a sticky problem with Djolt, trying to implement my Toronto Fires example in as few lines as possible. As part of this, I’ve come up with the idea of adding indirection to Djolt templates:

import djolt

d = {
    "a" : "It says: {{ b }}",
    "b" : "Hello, World"
}

t = djolt.Template("""
a: {{ a }}
b: {{ b }}
*a: {{ *a }}
""")

print t.Render(d)
""")

print t.Render(d)

Which yields:

a: It says: {{ b }}
b: Hello, World
*a: It says: Hello, World

This is significantly updated from the original version I posted here an hour ago. The indirection now makes the variable read as a template. This is a much more powerful concept.

November 24, 2008

Database roundup

db, ideas, python, semantic web · David Janes · 7:27 am ·

Here’s a few things I was reading about over the weekend.

SQLAlchemy

SQLAlchemy is a full-featured Design Pattern-heavy pythonic database ORM. I am totally going to use this for my next Python SQL database project and may even do some playing with old datasets (using the reflection features, yum) soon. If you are considering doing SQL work on your next Python project, don’t even bother with the usual PEP 249 stuff, start with this.

Note that if you’re working with Django it handles the DB in its own way so SQLAlchemy may be of limited utility.

CouchDB

CouchDB “is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API”. I couldn’t have written that more succently myself, so I didn’t. I qualified the paragraph above on SQLAlchemy that I’m going to use that for my next SQL project because I’m really biting at the bit to try CouchDB out. The CouchDB design philosophy – a REST API a returning lists of JSON-objects – reflects my current design paradigm very closely, and the only question I have is whether in practically scales to millions of rows.

A caveat that it’s written in the-cool-nerds-are-doing-it language Erlang, but because you don’t have to interact with that it should be OK for us mortals.

CouchDB is about to officially become a “top level” Apache project, though none of the documentation on the Apache.org site reflects this yet.

Virtuoso

Virtuoso is a “high-performance object-relational SQL database”. It apparently can perform well. As I came across through the Planet RDF aggregator, this may be something you want to look into if you’re working on an RDF/SPARQL project.

Amazon Web Services Hosted Data Sets

That’s a mouthfull, isn’t it? Amazon is offering to host public datasets on EC2 for free. What’s the catch? It will host the data, but you have to pay for the computing resources to use that data in the normal EC2 manner. Still, if you’re using a large public dataset and you’re already EC2-friendly, you might want to consider this program. An even more interesting thought occurs (though I’m not sure if it will fly): if you’re using large amounts of your own data on EC2, you may want to offer it up as a free resource.

There’s more on this on by Lidija Davis on Read/Write Web.

November 22, 2008

Toronto Fires

demo, djolt, ideas, maps, work · David Janes · 4:19 pm ·

Here’s a little mashup I’ve been putting together for the last few days: Toronto Fires.

It’s taking the data listed here on the City of Toronto’s Fire Services “Active Accidents”, scraping it (by pretending HTML is XHTML and treating it as WORK objects), geocoding it (using our WORK Google API) and mapping it (using this information).

This is very much a work in progress, but here’s a few more things that are involved:

  • we read body.table.tr.td[1].table.tr[1].td.table.tr as a list to get the rows in the table
  • we map those rows into the Geocoder use a new magic technology we’ll be explaining in the next few days: DjoltDjango-like templates
  • the output program is just one big Djolt template

I’m not quite satisfied with how the current page is constructed: I want the final result to be much more simple.

Older Posts »

Powered by WordPress