David Janes’ Code Weblog

December 29, 2008

Interesting links from the last month

db, ideas, semantic web · David Janes · 9:29 am ·
  • Aspena web server for highly extensible Python-based publication, application, and hybrid websites. As a potential alternative to Python’s builtin HTTPServer. MIT license.
  • V8V8 is Google’s open source JavaScript engine; written in C++; can run standalone, or can be embedded into any C++ application. I am very excited by this, as allowing users to send code to the server to execute Javascript is an amazingly powerful idea. If anyone knows of a Python wrapper, let me know please. New BSD license.
  • KomodoEdit (a testimonial) – I am going to try this out, though vi/vim will always be my first love (JJ also has an article on using ctags).
  • Virtuoso - an innovative Universal Server platform that delivers an enterprise level Data Integration and Management solution for SQL, RDF, XML, Web Services, and Business Processes. There’s way to much bla bla bla in that sentence, but apparently this is really sweet at handling SPARQL/RDF triples. Kingsley Idehen writes extensively about this on his blog (e.g.).
  • Drizzlea database optimized for Cloud and Net applications. Way too early to commit to this yet. See The New MySQL Landscape for more interesting going ons.
  • AuthKitauthentication and authorization toolkit for WSGI applications and frameworks.
  • Geodjangoa world-class geographic web framework. Lots of great ideas and pointers to libraries in here, even if you’re not planning to use this itself.
  • Disco – an open-source implementation of the Map-Reduce framework for distributed computing. The Disco core is written in Erlang, a functional language that is designed for building robust fault-tolerant distributed applications. Users of Disco typically write jobs in Python, which makes it possible to express even complex algorithms or data processing tasks often only in tens of lines of code. Here’s a blog post about the same, with references to vs. Hadoop.
  • On (Python) packaging. Debating distutil, easy_install and pip.

December 23, 2008

Interact with applications from the command line on the Mac

macintosh, tips · David Janes · 6:50 am ·

I’m a command line guy – I spend 90% of my non-blog reading day in Terminal, working on Python apps on my Mac or SSHed into work and working on Java and Javascript applications. I do realize the benefit of “real” applications, for image editing, for advanced text processing and so forth. On the Mac you can send files to the default application easily:

open "Madeline Doll House.jpg"

(Don’t ask). If the Mac doesn’t know how to deal with the file type, or you want to specify a particular app, that’s cool too:

open -a smultron index.jd

Note that it doesn’t matter that I’m SSHed into a work computer – we got around that issue last week using MacFUSE.

Pipe Cleaner (II)

demo, maps, pipe cleaner · David Janes · 6:37 am ·

Here’s the latest evolution of Pipe Cleaner, mainly recorded here for historical interest. The big change is that there isn’t a separate outside template – everything is in the one index.jd file. The new directive is template, which can read and execute an outside module or actually produce the final output (as we see in the very last directive). I have not put this up as an independent demo.

#
#	Import the Python fire module
#	- used in: map from:"fire.GetGeocodedIncidents" to:"incidents"
#
import module:"fire";

#
#	Header for Google Maps popup
#	- used in: map from:"fire.GetGeocodedIncidents" to:"incidents"
#
#
set to:"fitem.head.map" value:"""
<h3>
<a href="#{{ IncidentNumber }}">{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}</a>
</h3>
""";

#
#	Header for the sidebar
#	- used in: map from:"fire.GetGeocodedIncidents" to:"incidents"
#
set to:"fitem.head.sb" value:"""
<h3>
{% if latitude and longitude %}
<a href="javascript:js_maps.map.panTo(new GLatLng({{ latitude }}, {{ longitude }}))">*</a>
{% endif %}
<a href="#{{ IncidentNumber }}">{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}</a>
</h3>
""";

#
#	Body for the Google Maps pop and the sidebar
#	- used in: map from:"fire.GetGeocodedIncidents" to:"incidents"
#
set to:"fitem.body" value:"""
<p>
Alarm Level: {{ AlarmLevel }}
<br />
Incident Type: {{ IncidentType }}
<br />
City: {{ City }}
<br />
Street: {{ Street }} ({{ CrossStreet }})
<br />
Units: {{ Units }}
</p>
""";

#
#	Convert all the incidents from the fire module
#	to the path 'incidents' using the mapping rules defined above
#
#	- incidents are used in "gmaps.js" and "gmaps.html"
#
map from:"fire.GetGeocodedIncidents" to:"incidents" map:{
	"latitude" : "{{ latitude }}",
	"longitude" : "{{ longitude }}",
	"title" : "{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}",
	"uri" : "{{ HOME_URI }}#{{ IncidentNumber }}",
	"body" : "{{ *fitem.head.map|safe }}{{ *fitem.body|safe }}",
	"body_sb" : "{{ *fitem.head.sb|safe }}{{ *fitem.body|safe }}",
	"IncidentNumber" : "{{ IncidentNumber }}"
};

#
#	Load the 'gmaps' templates (for arbitrary geo-mapping),
#	using the 'incidents' for its items and the specified meta.
#
#	- used in in "gmaps.js" and "gmaps.html"
#
set to:"map_meta" value_render:true value:{
	"id" : "maps",
	"latitude" : 43.67,
	"longitude" : -79.38,
	"uzoom" : -13,
	"gzoom" : 13,
	"api_key" : "{{ cfg.gmaps.api_key|otherwise:'ABQIAAA...pIxzZQ' }}",
	"html" : {
		"width" : "1024px",
		"height" : "800px"
	}
};

#
#	Produce GMaps
#
template script:"gmaps" items:"incidents" meta:"map_meta";

#
#	Produce the final output
#
template value:"""
<html>
<head>
    <link rel="stylesheet" type="text/css" href="css.css" />
	{{ gmaps.js|safe }}
</head>
<body>
<div id="content_wrapper">
	<div id="map_wrapper">
		{{ gmaps.html|safe }}
	</div>
	<div id="text_wrapper">
{% for incident in incidents %}
	<div id="{{ incident.IncidentNumber }}">
		{{ incident.body_sb|safe }}
	</div>
{% endfor %}
</div>
</body>
</html>
""";

The gmaps.jd (imported in the second last directive) looks like as follows (there will not be a test). It’s designed to be a universal “show a map and plot points on in it” inclusion. I’ve added a few line breaks so the PRE box doesn’t break.

#
#
#
template to:"html" value:"""
<div id="id_{{ meta.id|jslug }}"
style="{% if meta.html.width %}width: {{ meta.html.width }};{% endif %}
{% if meta.html.height %} height: {{ meta.html.height }};{% endif %}
{% if meta.html.style %} style: {{ meta.html.style }};{% endif %}"
{% if meta.html.class %} class="{{ meta.html.class }}"{% endif %}
></div>
<script type="text/javascript">
js_{{ meta.id|jslug }}.onload();
</script>
""";

#
#
#
template to:"js" value:"""
<script
 type="text/javascript"
 src="http://maps.google.com/maps?file=api&v=2&key={{meta.api_key}}">
</script>
<script type="text/javascript">
js_{{ meta.id|jslug }} = {
 onload : function() {
  js_{{ meta.id|jslug }}.map = new GMap2(document.getElementById("id_{{ meta.id|jslug }}"));
  m = js_{{ meta.id|jslug }}.map;
  m.setCenter(new GLatLng({{ meta.latitude }}, {{ meta.longitude }}), {{ meta.gzoom }});

  // {{ items|length }} items follow
{% for itemd in items %}
{% if itemd.latitude and itemd.longitude %}

  // {{ itemd.title }}
  var ll = new GLatLng({{ itemd.latitude }}, {{ itemd.longitude }});
  var marker = js_{{ meta.id|jslug }}.make_marker(m, ll, "{{ itemd.body|safe|escapejs }}");
  m.addOverlay(marker);
{% else %}
  // an item is missing latitude or longitude
{% endif %}
{% endfor %}
 },

 make_marker : function(m, ll, html) {
  var marker = new GMarker(ll);
  GEvent.addListener(marker, "click", function() {
   m.openInfoWindowHtml(ll, html);
  });

  return marker;
 },

 end : 0
}
</script>
""";

December 22, 2008

Issues with utcoffset and pytz

demo, python · David Janes · 10:14 am ·

In the previous entry, we talked about the difficultly in finding out the delta from UTC for a timezone returned from the pytz module. In particular, consider the offset for St. John’s, Newfoundland which should be at -3:30.

dt_now = datetime.datetime.now()
tz = pytz.timezone('America/St_Johns')

offset = tz.utcoffset(dt_now)

Log(
    "using datetime.utcoffset",
    offset = format(offset),
)

With the unexpected result:

  message: using datetime.utcoffset
  offset: -4:29 (-12660)

I did a fair bit of Google searching for an answer without finding a satisfactory result, so I did further research on my own. To find the correct offset value, I found that this works:

dt_sj = tz.localize(dt_now)
offset = dt_sj - pytz.UTC.localize(dt_now)

Log(
    "using delta to UTC",
    offset = format(offset),
)

Which yields the correct:

  message: using delta to UTC
  offset: 03:30 (12600)

Note that if you’re going to use the above method for finding deltas, you’re going to have to take Daylight Savings Time into consideration also. I have not done this here, as I’m a little pressed for time and just want to illustrate the problem.

The issue seems to be with the way that pytz uses the Olson database entry (from here) for St. John’s – and all other locations. It appears that pytz is using the first rule it sees, from 1884, rather than the rule for the date that was passed in. I think this is a bug.

#
# St John's has an apostrophe, but Posix file names can't have apostrophes.
# Zone  NAME        GMTOFF  RULES   FORMAT  [UNTIL]
Zone America/St_Johns   -3:30:52 -  LMT 1884
            -3:30:52 StJohns N%sT   1918
            -3:30:52 Canada N%sT    1919
            -3:30:52 StJohns N%sT   1935 Mar 30
            -3:30   StJohns N%sT    1942 May 11
            -3:30   Canada  N%sT    1946
            -3:30   StJohns N%sT

The setup code for the examples above is:

from bm_log import Log
import dateutil.parser
import pytz
import datetime

def format(td):
    seconds = td.seconds + td.days * ( 24 * 3600 )
    return  "%02d:%02d (%s)" % ( seconds // 3600, seconds % 3600 // 60, seconds, )

Update 2010-03-09: This has been fixed in the code base and (presumably) will be in the next upcoming release.

Working with dates, times and timezones in Python

demo, python · David Janes · 7:37 am ·

Here’s a few examples of working with dates, times and timezones in Python. We are using the following packages:

  • datetime (part of the standard Python distribution)
  • dateutil – for date parsing, though there’s a lot more depth to this package that I’m not touching here
  • pytz – for timezone handling, and specifically making available the Olson timezone database to Python

There’s a lot of complexity to working with datetimes in any language; I’m not going to get into that but would prefer instead to show a few practical examples. Keep the following in mind:

  • datetimes may or may not have timezones associated with them. If they do not, they are called “naive” and their meaning is effectively defined by the program. In general, you want to work with non-naive datetimes. Generally the assumption would be that the naive datetime is in the application’s current timezone or the user’s preferred timezone
  • when working with datetimes, consider the strategy of converting everything to the universal UTC timezone, then converting back to the user’s timezone only when you need to display that to the user
  • if you are rolling your own code for handling dates, times and timezones and you haven’t done a lot of research, your implementation is garbage. Do yourself and everyone else a favor and use a library.

Our standard imports. Log is from the pybm library and it’s purpose is rather obvious.

from bm_log import Log
import dateutil.parser
import pytz
import datetime

Here’s an example of parsing the an e-mail or RSS type date using dateutil.

dts = "Thu, 13 Nov 2008 05:41:35 +0000"
dt = dateutil.parser.parse(dts)

Log(
    "Parsing an RFC type date",
    src = dts,
    dt = dt,
    iso = dt.isoformat(),
)
  message: Parsing an RFC type date
  dt: 2008-11-13 05:41:35+00:00
  iso: 2008-11-13T05:41:35+00:00
  src: Thu, 13 Nov 2008 05:41:35 +0000

Here’s an example of parsing an ISO Datetime

dts = '2008-11-13T05:41:35-0400'
dt = dateutil.parser.parse(dts)

Log(
    "Parsing an ISO Date with Timezone",
    src = dts,
    dt = dt,
    iso = dt.isoformat(),
)
  message: Parsing an ISO Date with Timezone
  dt: 2008-11-13 05:41:35-04:00
  iso: 2008-11-13T05:41:35-04:00
  src: 2008-11-13T05:41:35-0400

Here’s an example of parsing a naive timezone.

dts = '2008-11-13T05:41:35'
dt = dateutil.parser.parse(dts)

Log(
    "Parsing an ISO Date without a Timezone",
    src = dts,
    dt = dt,
    iso = dt.isoformat(),
)
  message: Parsing an ISO Date without a Timezone
  dt: 2008-11-13 05:41:35
  iso: 2008-11-13T05:41:35
  src: 2008-11-13T05:41:35

Here’s are two similar example, showing how to force the timezone if it’s not present. This will happen in the first part, but not the second.

tz = pytz.timezone('America/Toronto')
dts = '2008-11-13T05:41:35'
dt = dateutil.parser.parse(dts)
if dt.tzinfo == None:
    dt = dt.replace(tzinfo = tz)

Log(
    "Parsing an ISO Date without a Timezone BUT specifying default TZ",
    src = dts,
    dt = dt,
    iso = dt.isoformat(),
    tz = tz,
)

tz = pytz.timezone('America/Toronto')
dts = '2008-11-13T05:41:35-0400'
dt = dateutil.parser.parse(dts)
if dt.tzinfo == None:
    dt = dt.replace(tzinfo = tz)

Log(
    "Parsing an ISO Date with a Timezone AND specifying default TZ",
    src = dts,
    dt = dt,
    iso = dt.isoformat(),
    tz = tz,
)
  message: Parsing an ISO Date without a Timezone BUT specifying default TZ
  dt: 2008-11-13 05:41:35-05:00
  iso: 2008-11-13T05:41:35-05:00
  src: 2008-11-13T05:41:35
  tz: America/Toronto

  message: Parsing an ISO Date with a Timezone AND specifying default TZ
  dt: 2008-11-13 05:41:35-04:00
  iso: 2008-11-13T05:41:35-04:00
  src: 2008-11-13T05:41:35-0400
  tz: America/Toronto

Update: here’s an example of moving datetimes to UTC and then to a different Timezone. Remember: you want your backend code to work with UTC datetimes for simplicity and correctness:

dts = '2008-11-13T05:41:35-0400'
dt_orig = dateutil.parser.parse(dts)
dt_utc = dt.astimezone(pytz.UTC)

Log(
    "Changing a datetime to UTC",
    src = dts,
    dt_orig = dt_orig,
    dt_utc = dt_utc,
)

tz_vancouver = pytz.timezone('America/Vancouver')
dt_vancouver = dt_utc.astimezone(tz_vancouver)

Log(
    "Changing UTC datetime to a different timezone",
    dt_vancouver = dt_vancouver,
    dt_utc = dt_utc,
)
  message: Changing a datetime to UTC
  dt_orig: 2008-11-13 05:41:35-04:00
  dt_utc: 2008-11-13 09:41:35+00:00
  src: 2008-11-13T05:41:35-0400

  message: Changing UTC datetime to a different timezone
  dt_utc: 2008-11-13 09:41:35+00:00
  dt_vancouver: 2008-11-13 01:41:35-08:00

Here is an example of listing all “common” timezones using pytz. Note that “America” refers to the two continents, not the Irish word for the United States. Printing the actual timezone offset turned out to be a surprisingly complex task, which I will outline in a different blog post. For now let it suffice that with pytz try not to depend on utcoffset.

dt_now = datetime.datetime.now()

def tzname2offset(tzname):
    dt_in_utc = pytz.UTC.localize(dt_now)
    dt_in_tz = pytz.timezone(tzname).localize(dt_now)

    offset = dt_in_utc - dt_in_tz
    seconds = offset.seconds + offset.days * ( 24 * 3600 )

    return  "%02d:%02d" % ( seconds // 3600, seconds % 3600 // 60, )

Log(
    "Olsen (pytz) common timezones and their UTC offsets",
    timezones = map(
        lambda tzname: ( tzname, tzname2offset(tzname), ),
        pytz.common_timezones,
    )
)
  message: Olsen (pytz) common timezones and their UTC offsets
  timezones:
    [('Africa/Abidjan', '00:00'),
     ('Africa/Accra', '00:00'),
     ('Africa/Addis_Ababa', '03:00'),
     ('Africa/Algiers', '01:00'),
     ('Africa/Asmara', '03:00'),
...
     ('Pacific/Wake', '12:00'),
     ('Pacific/Wallis', '12:00'),
     ('US/Alaska', '-9:00'),
     ('US/Arizona', '-7:00'),
     ('US/Central', '-6:00'),
     ('US/Eastern', '-5:00'),
     ('US/Hawaii', '-10:00'),
     ('US/Mountain', '-7:00'),
     ('US/Pacific', '-8:00'),
     ('UTC', '00:00')]

December 18, 2008

Pipe Cleaner

demo, djolt, dqt, html / javascript, ideas, jd, maps, pipe cleaner, pybm, work · David Janes · 6:38 pm ·

I’ve been working (in my decreasing available spare time) on a project to pull together into a project called “Pipe Cleaner” all the various concepts I’ve been mentioning on this blog: Web Object Records (WORK) for API Access and object manipulation, Djolt for generating text from templates, Data/Query/Transform/Template (DQT) for transforming data and JD for scripting these elements together. The pieces came together this morning enough to put a demo together and here it is – the Toronto Fires Pt II Demo.

How, you may ask, does this differ from the original Toronto Fires Demo? The answer is how it is put together, which we describe here.

Index.dj

This is the Djolt template that generates the output. The data fed to this template is generate by the JD script, described in the next section.

<html>
<head>
    <link rel="stylesheet" type="text/css" href="css.css" />
    {{ gmaps.js|safe }}
</head>
<body>
<div id="content_wrapper">
    <div id="map_wrapper">
        {{ gmaps.html|safe }}
    </div>
    <div id="text_wrapper">
{% for incident in incidents %}
    <div id="{{ incident.IncidentNumber }}">
        {{ incident.body_sb|safe }}
    </div>
{% endfor %}
</div>
</body>
</html>

Quite simple … as you can see, most of the data is being pulled in from elsewhere. The elsewhere is provided by the script described in the next section.

Index.jd

This is the script that pull all the pieces together. Note that I’m not 100% happy with the way the data is imported, I would like the geocoding to become part of this data flow too. In the next release perhaps.

First we pull in the “fire” module that we wrote in the previous Map examples. This is doing exactly what you think: importing a Python module. We may have to increase the security or restrict this to working with an API for general purpose use.

import module:"fire";

Next we define two headers – one that is going to appear in the Google Maps popup, the next that is going to appear in the sidebar. They need to be different as they refer to themselves. Note that the sidebar header “breaks” the encapsulation of Google Maps – this seems to be unavoidable. The to:"fitem.head.map" and to:"fitem.head.sb" are manipulating a WORK dictionary to store values.

Note also here that we’ve extended JD to accept Python multiline strings – this was unavoidable if JD was to be useful to me.

set to:"fitem.head.map" value:"""
<h3>
<a href="#{{ IncidentNumber }}">{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}
</h3>
""";

set to:"fitem.head.sb" value:"""
<h3>
{% if latitude and longitude %}
<a href="javascript:js_maps.map.panTo(new GLatLng({{ latitude }}, {{ longitude }}))">*
{% endif %}
<a href="#{{ IncidentNumber }}">{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}
</h3>
""";

The next block defines the text of the body used to describe a fire incident. It follows much the same pattern as the previous block.

set to:"fitem.body" value:"""
<p>
Alarm Level: {{ AlarmLevel }}
<br />
Incident Type: {{ IncidentType }}
<br />
City: {{ City }}
<br />
Street: {{ Street }} ({{ CrossStreet }})
<br />
Units: {{ Units }}
</p>
""";

This is a map: it is translating the values in fire.GetGeocodeIncidents into a new format and storing that in incidents. The format that we were are storing it in is understood by the Google Maps generating module.

We may rename this translate, as the word map is somewhat overloaded.

map from:"fire.GetGeocodedIncidents" to:"incidents" map:{
    "latitude" : "{{ latitude }}",
    "longitude" : "{{ longitude }}",
    "title" : "{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}",
    "uri" : "{{ HOME_URI }}#{{ IncidentNumber }}",
    "body" : "{{ *fitem.head.map|safe }}{{ *fitem.body|safe }}",
    "body_sb" : "{{ *fitem.head.sb|safe }}{{ *fitem.body|safe }}",
    "IncidentNumber" : "{{ IncidentNumber }}"
};

Next we set up the “meta” (see WORK meta description if you’re not following along) for the maps. The render_value:true declaration makes PC interpret the templates in strings). We then call our Google Maps generating code (which are actually more Pipe Cleaners) and that gets fed to the Djolt template we first showed you. Clear? Maybe not, we’ll have more examples coming…

set to:"map_meta" render_value:true value:{
    "id" : "maps",
    "latitude" : 43.67,
    "longitude" : -79.38,
    "uzoom" : -13,
    "gzoom" : 13,
    "api_key" : "{{ cfg.gmaps.api_key|otherwise:'...mykey...' }}",
    "html" : {
        "width" : "1024px",
        "height" : "800px"
    }
};

load template:"gmaps.js" items:"incidents" meta:"map_meta";
load template:"gmaps.html" items:"incidents" meta:"map_meta";

December 16, 2008

Woah!

administrivia · David Janes · 8:00 am ·

WordpressThis site has been upgraded to WordPress 2.7.

December 13, 2008

Brief notes on SIMILE Timeline

demo, html / javascript · David Janes · 8:01 am ·

SIMILE Timeline is “the Google Maps for time based events”. It used to be housed at MIT but now it’s graduated to Google Code. I’ve created an example application showing this year’s Oscar awards and a number of movies that are, umm, are not all likely to be nominated.

  • the application source can be seen here; it is based on this demo and some work I had done previously. Note:
    • multiple scrolling bands linked together
    • custom icons
    • custom colors
  • we demonstrate populating the timeline widget using JSON data coded in the application; the most difficult part of putting this demo together was cutting and pasting all this data, a task we hope to make easier with our DQT code
  • the documentation for Timeline is starting to diverge from the source code; showEventText is now replaced by overview in the band creation code
  • there’s a large number of weaknesses (still) with Timeline for dealing with arbitrary data, these may be corrected in the Javascript code but I don’t have time to go through all this
    • note the incorrect placement of custom icons; using Firebug to inspect the HTML, I discovered that unfortunately everything is placed using style tags so it’s difficult to correct using CSS. Ideally I would like to be able to assign classes and ID tags to everything
    • there doesn’t seem to be an obvious way to control the widths of the labels; in fact, if I reduce the band spacing in the top band, the text starts to overlap in a horrible manner
    • popup information boxes get confused when there is too little space to display information
    • when I don’t add a description to events, it uses “undefined” (see the Oscar Nominations Period)
    • date display functions are inferring more resolution (i.e. an actual time as opposed to the just the date) that I’m giving it

If anyone has corrections for me I’ll update the demo

December 12, 2008

JD – JSON Declaration Language

ideas, jd · David Janes · 5:55 pm ·

I’ve just added a new module called bm_jd to the pybm project. It implements a “little language” for declaring information, like a configuration file, when the details are all specified in JSON.

The language is very simple, consisting of semi-colon terminated statements; each statement having a command and zero or more arguments. Each argument may or may not have JSON data – if it does, it will be set off with a colon.

The BNF looks like this:

    <document> ::= <statement>*
    <statement> ::= <command> ( <word> | <word>:<json> )*
    <command>|<word> ::= [a-zA-Z0-9_]
    <json> ::= ... any valid JSON data ...

You can use the pybm JD parser in several ways:

  • implement a subclass of JDParser, defining CustomizeProduce; or
  • implement a subclass of DispatchJDParser, defining a call_<command> method for each command you plan to allow

In either case, you call a method FeedString to get the parser rolling.

There’s also a LogJDParser, which just dumps parsing results. Here’s an example of a JD document. Don’t worry about Djolt code in the JSON, that’s just text as far as this example is concerned:

read_template from:"fire_body" render:false;
map from:"fire.GetGeocodeIndidents" to:"incidents" map:{
    "latitude" : "{{ latitude }}",
    "longitude" : "{{ longitude }}",
    "title" : "{{ AlarmLevel}}: {{ IncidentType }} on {{ RawStreet }}",
    "uri" : "{{ HOME_URI }}#{{ IncidentNumber }}",
    "body" : "{{ *fire_body|safe }}",
    "IncidentNumber" : "{{ IncidentNumber }}"
};
read_template from:"gmaps" items:"incidents" meta:{
    "id" : "maps",
    "latitude" : 43.67,
    "longitude" : -79.38,
    "uzoom" : -13,
    "gzoom" : 13,
    "api_key" : "{{ cfg.gmaps.api_key|otherwise:'_' }}",
    "html" : {
        "width" : "1024px",
        "height" : "800px"
    }
};

December 11, 2008

Mount NTFS and remote filesystems using MacFUSE

macintosh, tips · David Janes · 11:56 am ·

MacFUSEEarlier this week I bought a LaCie 500Gb USB drive so I could bring VMWare images between work and home. When I went to copy the image, the copy failed with no meaningful error message (Error 0, I believe). Trying the copy on the command line was a little more informative: as it turns out, the LaCie drive ships with a FAT-32 file system which can only handle files up to 4Gb in size. As the image I was trying to copy had a 8Gb file in it, this was a no go.

My initial thought was to use the UNIX commands tar and split to break the files into individual smaller chunks, but this is hardly a satisfactory answer. If I formatted the drive to the Mac filesystem, the Windows machines would not be able to read it at all. If I formatted the drive the “new” NTFS filesystem, Windows can read and write just fine but the Macintosh wouldn’t be able to write to it.

Fortunately, there’s an excellent install for the Mac called MacFUSE that allows access to all sorts of filesystem types not natively supported by the Macintosh, include NTFS. Here’s how I set up MacFUSE.

MacFUSE Installation

Installation by itself does nothing except set you up for the next stage: installing drivers for particular file systems.

NTFS

You have to search through the documentation for a bit to figure out where to get NTFS to with Windows filesystems. It actually turns out to be rather easy:

You can now write to NTFS drives. It’s a little slow – it’s taking me about 2 hours to copy 8Gb to the La Cie drive, but that’s better than not being able to do it at all. You wouldn’t want to work live off the drive however, and it may be worth investigating commercial NTFS compatibility applications if you need to do this.

To reformat your La Cie drive plotline, use Applications > Disk Utility to erase and install an empty NTFS file system.

SSHFS

SSHFS lets you see remote filesystems through SSH.

  • go to http://code.google.com/p/macfuse/wiki/MACFUSE_FS_SSHFS
  • download the version appropriate to your Mac; you can store this in your home directory or if you’re a little more organized about your path, a directory link ~/bin
  • make a mount point – this is just a directory on your Mac that is needed by MacFUSE; it can be hidden as Mac OS will show you the mounted drive on your desktop and in /Volumes. For example, on the command line run mkdir -p ~/.Volumes/Remote.
  • run the mount command; you’ll be prompted for your remote system password

You’ll see the drive appearing on your desktop. I’ve actually created a shell alias to do the mounting for me called “mount-xxx”. If you don’t know how to do this, it’s probably too much to go into right now.

The nice thing about SSHFS is that I could see being able to run an entire Mac desktop development shop with all the backend computing running Linux, all being accessed nicely through SSHFS.

A brief survey of Yahoo Pipes as a DQT

demo, djolt, dqt, ideas, semantic web, work · David Janes · 7:19 am ·

MacFUSEYahoo Pipes is a visual editor of mashups, allowing you to take data from sources on the net, transform them in various interesting ways and output the result as Atom, RSS or JSON. The primary downside Pipes of course is that you’re totally dependent on Yahoo for the infrastructure: it runs at Yahoo pulling feeds that have to be accessable through the public Internet.

It’s easy to use Pipes: just go to this page and start working with the sample example Pipe. You’ll need a Yahoo login ID, but most of us have that anyway. I’ve created an example that uses Yahoo Pipes to feed a Djolt template which you can see here.

We can analyze Pipes in the terms of the DQT paradigm we’ve outlined in the previous post.

Data Sources and Queries

Sources and Queries are merged (quite logically) in the Pipes interface. You can read in depth documentation here.

  • Fetch CSV
  • Feed Autodiscovery – outputs syndication feeds found on a page (RSS feeds on a CBC page)
  • Fetch Feed
  • Fetch Page – will read a page and parse the contents with a reg
  • Fetch Site Feed – this is the logical combination of Fetch Feed and Fetch Autodiscovery
  • Flickr – find images by tag near a location (photos of cats in Toronto)
  • Google Base – look up information in Google Base
  • Item Builder – a way of building new items from existing items
  • Yahoo Local
  • Yahoo Search

Transforms

The operator documentation can be read here.

  • Count
  • Filter
  • Location Extractor – a geocoder that magically looks for locations
  • Loop
  • Regex
  • Rename
  • Reverse
  • Sort
  • Split
  • Sub-element – pulls a particular sub-element of an item and makes that the item. This is very much like WORK path manipulation
  • Tail
  • Truncate
  • Union
  • Unique
  • Web Service

Plus a number of specialized data services, for dealing with elements such as dates.

Templates

Pipes does not provide an arbitrary Djolt-like template producing HTML. Instead, they provide a number of pre-made code templates that output well known data types, including RSS, JSON and Atom (and some stranger choices, like PHP).

December 9, 2008

Introducing DQT – Data/Query/Transform/Template

dqt, ideas · David Janes · 4:05 pm ·

Data/Query/Transform/Template – DQT, dropping the final T – is a commonly used pattern for displaying data on a website. The elements of this pattern are:

  • a Data source, such a blog database, an e-mail store, the Internet as a whole, a MySQL, and in particular the results of an API call.
  • a Query, which is a way of selecting a particular subset or slice of the data (typically homogeneous)
  • Transform rules, which can make the data look different by renaming fields, enhancing data using tools such as geolocation, filtering records out, merging multiple data sources and so forth.
  • a Template, which is a way of converting to a useful end-user format, such as HTML, JSON or XML

In the particular context of what I’m writing about, we can assume that we’re manipulating WORK items – that is, an an API returns a “Meta” block of information and a stream of “Items”, each in turn which are WORK items also. By identifying common patterns of dynamic page construction, my hope is that we can simplify page and mashup creation.

You’ve seen this pattern many times. My plan is that be describing it properly, we can make it easier to do.

Wordpress Blogs

  • The Data source is the MySQL table with blog posts, plus ancillary information pulled from other tables.
  • The Query is some combination of ( page number, post path, category, tag ). Not all combinations are legal obviously, but this is the information that can be encoded in a URL request. The Data source and the Query result a number of posts being made available for further processing
  • The Template is the PHP code that converts the individual database items into HTML for display

There is no Transform in this example. See it here ;-).

BTW: Don’t take this as a how-to guide for Wordpress. I’m trying to look at this from a high-level conceptual point-of-view.

Google Mail

  • the Data source is all the e-mails in the Google database, probably billions or trillions of messages
  • the Query is some combination of ( userid, page number, search ). The userid is not encoded in the URL, it is known be
  • the Template is the Javascript code that

That’s a really high level view: in fact, Google Mail does this DQT twice: the first time around to select JSON or XML data to be transmitted to the user’s browser; the second time around to locally on the user’s browser select and display items.

Yahoo News

  • the Data source is Yahoo’s news database
  • the Query is a category, or not category at all (poorly encoded the URL, I may add)
  • the Transform groups news into like categories (play along with me here)
  • the Template is the Yahoo’s HTML generator, whatever that may be

See it here.

An RSS feed

The Data and the Query are substantially similar to the Wordpress Blog example, but:

  • we Transform the fields into a format that can be understood by an RSS generator
  • the Template is a specialized object that converts WORK items into RSS entries, that is, we don’t (or shouldn’t) use a Djolt-like template to generate XML.

This is obviously a somewhat of a hypothetical example, but reflects my recent ideas about how machine readable data should be generated.

December 8, 2008

Coding backwards for simplicity

djolt, dqt, ideas, pybm, python, work · David Janes · 4:58 pm ·

I haven’t been posting as much as I like here for the last three weeks, not because of lack of ideas but because I haven’t been able to consolidate what I’ve been working on into a coherent thought. I’m trying to come up with a overreaching conceptual arch that covers WORK, Djolt and the various API interfaces I’ve been coded. Tentatively and horribly, I’m calling this Data/Query/Transform/Template right now though I’m expecting this to change.

The first demo of this … without further explanation … can be seen here. More details about what this is actually demonstrating (besides formatting this blog) will be forthcoming.

What I want to draw attention to in this post is how I coded this. What I’ve been doing for the last several weeks is coding backwards: I start with what I want the final code to look like and then figure out all the libraries, little languages and so forth that would be needed to code that. After several false starts, my conceptual logjam broke about a week ago and code started radically simplifying.

The ideal code, in my mind, is almost entirely static declarations: no loops, no if statements, no while statements, no goto-type statements (god help us). We simply specify how the parts are connected, and hope that we can abstract the complexity into the libraries that make this all happen. The code that you see below is actually post all my conceptualizing: I just wanted to write some code and since I had almost all the parts together it fell together quite nicely:

import bm_wsgi
import bm_io

import djolt
import api_feed

from bm_log import Log

class Application(bm_wsgi.SimpleWrapper):
    def __init__(self, *av, **ad):
        bm_wsgi.SimpleWrapper.__init__(self, *av, **ad)

    def CustomizeSetup(self):
        self.html_template_src = bm_io.readfile("index.dj")
        self.html_template = djolt.Template(self.html_template_src)

        self.context = djolt.Context()
        self.context["paramd"] = {
            "feed" : "http://feeds.feedburner.com/DavidJanesCode",
            "template" : """\
<ul>
{% for item in data.items %}
	<li><a href="{{ item.link }}">{{ item.title }}</a></li>
{% endfor %}
""",
        }
        self.context.Push()
        self.context["paramd"] = self.paramd
        self.context["data"] = api_feed.RSS20(self.context.as_string("paramd.feed"))

    def CustomizeContent(self):
        yield   self.html_template.Render(self.context)

if __name__ == '__main__':
    Application.RunCGI()

There’s almost nothing there! In particular, note:

  • bm_wsgi.SimpleWrapper handles all the WSGI interface work, including determining when to output HTML headers, error trapping, and Unicode to UTF-8 encoding
  • the most complicated part of the application is setting up the Context. In particular, note that self.paramd is automatically populated by the QUERY_STRING passed to the application, and the double setting we do here allows us to have default values.
  • If you want to see the HTML template that drives the application it is here. Note two variations from Django templates: the {% asis %} block which doesn’t intrepret it’s content as Djolt code and the {{ *paramd.template|safe }} variable which interprets the variable’s contents as a template.
  • Methods called Customize-something are my convention for framework functions, i.e. methods that will be called for us rather than methods we call.

How to JSON encode iterators

ideas, python · David Janes · 2:32 pm ·

As part of my recent explorations, I’ve been playing a lot with Python iterators/generators. The key efficiency of iterators is that when working with lengthy list-like objects, you need only create the part that’s being looked at. It’s just-in-time objects.

If you attempt to JSON serialize an object with an iterator/generator object in it, the json module throws a cog: it doesn’t know how to serialize these types of objects. The json module is extensible and the documentation makes a suggestion how to do this:

class IterEncoder(json.JSONEncoder):
 def default(self, o):
   try:
       iterable = iter(o)
   except TypeError:
       pass
   else:
       return list(iterable)
   return JSONEncoder.default(self, o)

print json.dumps(xrange(4), cls = IterEncoder)

This seems somewhat ugly to me. In particular, lots of objects can be wrapped by the iter function that don’t need to be, plus lots of objects will cause that TypeError to be thrown which seems to be rather a bit of waste. Here’s the solution I came up with:

class IterEncoder(json.JSONEncoder):
    def default(self, o):
        try:
            return  json.JSONEncoder.default(self, o)
        except TypeError, x:
            try:
                return  list(o)
            except:
                return  x

This tries to encode the object the normal way. Only if that doesn’t work do we try to turn the object into a list. If that’s not convertible (i.e. the list object constructor fails) we go back and throw the original exception provided by JSONEncoder – we’ve really failed.

You use this as follows:

class X:
    def Iter(self):
        yield 1
        yield 2
        yield 3
        yield 4

xi = X().Iter()

print json.dumps(xi, cls = IterEncoder)
print json.dumps(xrange(4), cls = IterEncoder)

Which yields the expected:

[1, 2, 3, 4]
[0, 1, 2, 3]

Don’t be overly tempted to check the type of o: it may be types.GeneratorType or types.XRangeType or perhaps even something else that I haven’t found out yet.

December 7, 2008

Once more with the maps

maps · David Janes · 7:48 am ·

Michal Migurski added a very informative comment about tiling levels here, with a pointer to way more detailed information here.

December 6, 2008

JavaFX 1.0 released

html / javascript, java · David Janes · 6:54 am ·

CNET has a very detailed article on JavaFX, Sun’s better-late-than-never-maybe-hopefully answer to Adobe’s Flash / AIR and Microsoft’s Silverlight:

With a back-to-the-future technology called JavaFX to be launched Thursday, Sun Microsystems hopes to attract a new class of developer while building a much-needed new revenue source.

JavaFX 1.0 returns to the sales pitch that Sun used during Java’s launch more than 13 years ago: a foundation for software on a wide variety of computing “clients” such as desktop computers or mobile phones. JavaFX builds on current Java technology but adds two major pieces.

First is a new software foundation designed to run so-called rich Internet applications–network-enabled programs with lush user interfaces. Second is a new programming language called JavaFX Script that’s intended to be easier to use than traditional Java.

The benefit for me (and maybe you) is that most web shops have three different classes of developers: backend programmers, HTML/CSS coders, and The Flash Guy. The problem with The Flash Guy is that they have to be involved to make any changes to Flash components; JavaFX instead provides components that can easily be manipulated by coders and by web designers (with a little training I think).

The JavaFX site looks a hell of a lot better than anything Sun’s produced in the past too, so maybe they’re learning that looks matter too. Here’s some examples to play with.

All your Base are belong to us

db, freebase, semantic web · David Janes · 6:26 am ·

Freebase is a user-editable, user-extensible structured database, a sort of one-stop shop semantic web/Wikipedia application. I started playing with Freebase about a year ago and the application has made significant strides over that period, especially in the usability department. Freebase also provides a very nice API which I’m using in GenX, with the caveat that it’s currently almost useless because of query timeouts.

I just came across the following page on Freebase: http://vancouver.freebase.com/. This page is what Freebase calls a Base, which is a collection of Tables/Views, which are things like “Vancouver Bloggers“, “Mayoral Candidates 2008” and so forth. A Table/View is a list of Topics, which are basically the equivalent of a Wikipedia page. Get all that? It makes sense after a while

A few observations:

  • Why have I written Table/View above? Because in some places it’s called a Table and other places it’s called a View. Which is it? I’m guessing View but it’s still not 100% clear.
  • I decided to create our own Toronto Base especially for the TorCamp community. Given that you get your own top-level domain name there’s somewhat of an incentive to be a first-mover on this
  • When you create a Base, it provides a list of suggested Views that can be added. Nice. Unfortunately, it added each View twice. I then had to go delete the duplicate View manually. Not so nice. And then even though I’ve deleted the View it still shows up on a detail page. Sigh.
  • On thus plus side, this is all done in a nice-Ajaxy way
  • It’s really not at all obvious how you create a new View. Really not obvious. Here’s the documentation.
  • My initial opinion was that Views seem to be copies, not references: this turns out to be a wrong assumption on my part. Views are in fact (if I got this right) the results of a query on the Freebase db. This means that as more Topics match the View query, they’ll automatically show up. The query is a copy, not a reference, but this is a good thing.
  • The implication is that it’s difficult to create a View that is an arbitrary “bag” of topics. For example, if I want to create a Toronto Bloggers View, I have to actually make sure that all the Topics that will show up are marked with some attribute that can be matched to give them a Toronto-bloggerness quality.

December 4, 2008

Djolt Indirection

demo, djolt, ideas, python · David Janes · 6:05 am ·

I’ve been working through a sticky problem with Djolt, trying to implement my Toronto Fires example in as few lines as possible. As part of this, I’ve come up with the idea of adding indirection to Djolt templates:

import djolt

d = {
    "a" : "It says: {{ b }}",
    "b" : "Hello, World"
}

t = djolt.Template("""
a: {{ a }}
b: {{ b }}
*a: {{ *a }}
""")

print t.Render(d)
""")

print t.Render(d)

Which yields:

a: It says: {{ b }}
b: Hello, World
*a: It says: Hello, World

This is significantly updated from the original version I posted here an hour ago. The indirection now makes the variable read as a template. This is a much more powerful concept.

Python 3000 – we are no longer flying

uncategorized · David Janes · 5:45 am ·

Python 3000 – the next generation, backwards incompatiable with all previous versions of Python – is now available. I think I’ll sit this one out for a while.

Powered by WordPress