Just add in your Python comments:
# MARK: comment # TODO: comment # FIXME: comment # !!!: comment # ???: comment
From here.
Just add in your Python comments:
# MARK: comment # TODO: comment # FIXME: comment # !!!: comment # ???: comment
From here.
If you want to the use the Python Imaging Library on Mac OS/X Snow Leopard, these instructions appear to be the best way to to get libjpeg installed:
1. Download the source from http://libjpeg.sourceforge.net/
2. Extract, configure, make:
tar zxvf jpegsrc.v6b.tar.gz
cd jpeg-6b
cp /usr/share/libtool/config/config.sub .
cp /usr/share/libtool/config/config.guess .
./configure --enable-shared --enable-static
make
3. You may need to create the following directories:
sudo mkdir -p /usr/local/include
sudo mkdir -p /usr/local/lib
sudo mkdir -p /usr/local/man/man1
4. Now you can install it as usual.
sudo make install
I used to use Fink on Leopard, but it didn’t seem to work to well this time. If you’ve previously made an attempt at installing PIL, make sure to rm -rf build.
Having recently upgraded to Django 1.1, I suddenly started getting the error messages that look like:
File "/Library/Python/2.6/site-packages/django/db/models/fields/related.py", line 257, in __get__
rel_obj = QuerySet(self.field.rel.to).get(**params)
File "/Library/Python/2.6/site-packages/django/db/models/query.py", line 300, in get
num = len(clone)
File "/Library/Python/2.6/site-packages/django/db/models/query.py", line 81, in __len__
self._result_cache = list(self.iterator())
File "/Library/Python/2.6/site-packages/django/db/models/query.py", line 251, in iterator
obj = self.model(*row[index_start:aggregate_start])
File "/Library/Python/2.6/site-packages/django/db/models/base.py", line 324, in __init__
signals.post_init.send(sender=self.__class__, instance=self)
File "/Library/Python/2.6/site-packages/django/dispatch/dispatcher.py", line 166, in send
response = receiver(signal=self, sender=sender, **named)
File "/Library/Python/2.6/site-packages/django/db/models/fields/files.py", line 368, in update_dimension_fields
(self.width_field and not getattr(instance, self.width_field))
AttributeError: 'Icon' object has no attribute 'width'
The issue turns out to be that you can’t just define the ImageField in your model, you also have to explicitly define the fields that will store the width and height fields for the image field. The sql generation tools for Django don’t do it for you.
For various reasons, I can’t do that this at this moment so I made the following I hack which I strongly recommend you don’t use (for efficiency reasons, as with this the height & width have to be computed every time you access the image). This is added to site-packages/django/db/models/fields around line 367.
if self.width_field and not hasattr(instance, self.width_field):
dimension_fields_filled = False
else:
dimension_fields_filled = not(
(self.width_field and not getattr(instance, self.width_field))
or (self.height_field and not getattr(instance, self.height_field))
)
The proper solutions probably involve:
Since Libin points to “Fizz Buzz” in one line of Ruby, I feel it’s only fair to do it in one line of Python:
print [ not i % 15 and "Fizz Buzz" or not i % 5 and "Buzz" or not i % 3 and "Fizz" or i for i in xrange(1, 101) ]
My preference is to really have a few more brackets in there, for clarity but apparently terseness is considered a virtue in and off itself sometimes. There’s other implementations of this in one line of Python:
Here’s our problem child HTML: Members of Provincial Parliament. Amongst the attrocities committed against humanity, we see:
<o:p>) and attributes (<st1:City w:st="on">)<?xml:namespace prefix = "o" ns = "urn:schemas-microsoft-com:office:office" />)This is so broken that even HTML TIDY chokes on it, producing a severely truncated file. This broken document provided me however an opportunity to play with the Python library Beautiful Soup, which lists amongst it’s advantages:
Alas, straight out of the box Beautiful Soup didn’t do it for me, perhaps because of some of my strange requirements (my data flow works something like this: raw document → XML → DOM parser → JSON). However, Beautiful Soup does provide the necessary calls to manipulate the document to do the trick. Here’s what I did:
First, we import Beautiful Soup and parse it to the object soup. We’re expecting an HTML node at the top, so we look for that.
import BeautifulSoup soup = BeautifulSoup.BeautifulSoup(raw) if not hasattr(soup, "html"): return
Next, we loop through every node in the document, using Beautiful Soup’s findAll interface. You will see several variants of this call here in the code. What we’re looking for is use of namespaces, which we then add to the HTML element as attributes using fake namespace declarations.
We need to find namespaces already declared:
used = {}
for ns_key, ns_value in soup.html.attrs:
if not ns_key.startswith("xmlns:"):
continue
used[ns_key[6:]] = 1
Then we look for ones that are actually used:
nsd = {}
for item in soup.findAll():
name = item.name
if name.find(':') > -1:
nsd[name[:name.find(':')]] = 1
for name, value in item.attrs:
if name.find(':') > -1:
nsd[name[:name.find(':')]] = 1
Then we add all the missing namespaces to the HTML node.
for ns in nsd.keys(): if not used.get(ns): soup.html.attrs.append(( "xmlns:%s" % ns, "http://www.example.com#%s" % ns, ))
Next we look for attributes that aren’t properly XML declarations, e.g. HTML style <input checked />-type items.
for item in soup.findAll(): for index, ( name, value ) in enumerate(item.attrs): if value == None: item.attrs[index] = ( name, name )
Then we remove all nodes from the document that we aren’t expecting to see. If you keep the script tags you’re going to have to make sure that each node is properly CDATA encoded; I didn’t care about this so I just remove them.
[item.extract() for item in soup.findAll('script')]
[item.extract() for item in soup.findAll(
text = lambda text:isinstance(text, BeautifulSoup.ProcessingInstruction ))]
[item.extract() for item in soup.findAll(
text = lambda text:isinstance(text, BeautifulSoup.Declaration ))]
In the final step we convert the document to Unicode. This requires another step of post-processing: html2xml changes all entity uses that XML doesn’t recognize into a &#...; style. E.g. we do change but we don’t change &. At this point we now have a document that can be processed by standard DOM parsers (if you convert to UTF-8 bytes, sigh).
cooked = unicode(soup) cooked = bm_text.html2xml(cooked)
Here’s a neat API I completed this morning, called api_feeds. It takes a URL (or a list of them) and transforms them into:
If you’re following along at home, this is essentially the information needed for a single outline in an OPML subscription list.
Here’s a simple python example:
api = api_feeds.OneFeed()
api.request = {
"uri" : "http://code.davidjanes.com/blog/2009/01/23/transparently-working-with-oauath/",
}
pprint.pprint(api.response, width = 1)
And here’s what the output looks like:
{'link': u'http://code.davidjanes.com/blog',
'links': [{'href': u'http://feeds.feedburner.com/DavidJanesCode',
'rel': 'alternate',
'type': u'application/rss+xml'}],
'title': u"David Janes' Code Weblog"}
There’s actually quite a bit going on here behind the scenes, most of it using code I didn’t initially write but have quite heavily hacked: the Universal Feed Parser and the Feed Finder.
What becomes really interesting what happens when we combine this with other modules. Here’s an example of how we can build an OPML subscription list from all the posts I’ve tagged “python” and “django” in del.icio.us. The code looks up each link I’ve bookmarked, does the feed discovery above, filters out items that don’t have feeds, and outputs as OPML. Note the neat pipeline type aspect to the code:
api_delicious = api_delicious.PostsList(tag = "python django") api_many = api_feeds.ManyFeeds(require_feed = True) api_opml = api_opml.OPMLWriter() api_many.items = api_delicious.items api_opml.items = api_many.items print api_opml.Produce()
Producing the following OPML:
<opml encoding="utf-8" version="2.0">
<head>
<title>[Untitled]</title>
</head>
<body>
<outline htmlUrl="http://push.cx"
rssUrl="http://push.cx/feed"
text="Push cx"
type="rss"/>
<outline htmlUrl="http://crankycoder.com"
rssUrl="http://crankycoder.com/feed/"
text="crankycoder.com"
type="rss"/>
<outline htmlUrl="http://blog.dowski.com"
rssUrl="http://blog.dowski.com/feed/"
text="the occasional occurrence"
type="rss"/>
<outline htmlUrl="http://www.b-list.org/feeds/entries/"
rssUrl="http://feeds2.feedburner.com/b-list-entries"
text="The B-List: Latest entries"
type="rss"/>
<outline htmlUrl="http://blog.thescoop.org"
rssUrl="http://blog.thescoop.org/feed/"
text="The Scoop"
type="rss"/>
<outline htmlUrl="http://effbot.org"
rssUrl="http://effbot.org/zone/rss.xml"
text="effbot.org"
type="rss"/>
<outline htmlUrl="http://blog.disqus.net"
rssUrl="http://feeds.feedburner.com/BigHeadLabs"
text="Disqus"
type="rss"/>
<outline htmlUrl="http://blog.ianbicking.org"
rssUrl="http://blog.ianbicking.org/feed/atom/"
text="Ian Bicking: a blog"
type="rss"/>
<outline htmlUrl="http://antoniocangiano.com"
rssUrl="http://feeds.feedburner.com/ZenAndTheArtOfRubyProgramming"
text="Zen and the Art of Programming"
type="rss"/>
<outline htmlUrl="http://www.carthage.edu/webdev"
rssUrl="http://www.carthage.edu/webdev/?feed=rss2"
text="carthage webdev"
type="rss"/>
<outline htmlUrl="http://www.eweek.com"
rssUrl="http://www.eweek.com/rss-feeds-13.xml"
text="Application Development - RSS Feeds"
type="rss"/>
<outline htmlUrl="http://jeffcroft.com/"
rssUrl="http://feeds.feedburner.com/jeffcroft/blog"
text="JeffCroft.com: Latest blog entries"
type="rss"/>
</body>
</opml>
This will be just as terse (terser, probably) when written as a Pipe Cleaner script; I’m just struggling over how to introduce the authentication code gracefully into the scripts.
This is part one of two posts I’m going to write about OAuth; the second will be somewhat more critical in tone. Before I criticize – and I know it’s hard to put together technologically things like OAuth – I want to actually accomplish something with it, so I at least I appear that I have somewhat of a clue about it. This is a report of what I’ve done.
bm_uri is a libary and tool I’ve written for working with URIs, and in particular http:// and https:// URLs. Here are some of the advantages of using bm_uri over all the normal Python urllib and urllib2 methods:
bm_uri will return the cached version, likewise if it has been downloaded in the near past, the cached version will be returned rather than hitting the net againbm_uri handles all the protocol stuff for you (such as User-Agent, Last-Modified and so forth) so you don’t have toHere is an example of accessing a OAuth resource using bm_uri returning my current location from Fire Eagle as a Python object. From a programming point of a view, I believe I have reduced this to close to the minimum number of steps possible. Here’s the setup phase:
import bm_uri import bm_oauth import pprint bm_cfg.cfg.initialize() bm_oauth.OAuth(service_name = "fireeagle")
Here’s using it in code – note how there’s no reference to OAuth here whatsoever.
loader = bm_uri.JSONLoader('https://fireeagle.yahooapis.com/api/0.1/user.json?format=json')
loader.Load()
pprint.pprint(loader.GetCooked())
And here’s the output of the program:
{u'stat': u'ok',
u'user': {u'location_hierarchy': [{u'best_guess': True,
u'geometry': {u'coordinates': [-79.418426513699998,
43.731891632100002],
u'type': u'Point'},
u'id': 572261,
u'label': None,
u'level': 1,
u'level_name': u'postal',
u'located_at': u'2008-03-19T04:09:30-07:00',
...
u'name': u'Canada',
u'normal_name': None,
u'place_id': u'EESRy8qbApgaeIkbsA',
u'woeid': 23424775}],
u'readable': True,
u'writable': False}}
The devil is in the details, obviously and with OAuth, the little satan is doing the initial setup. Here’s how I did this for Fire Eagle – there’ll be something analogous for whatever service you are using:
Note how Yahoo has conveniently made that last URL similar looking to the others, but not quite the same. Thanks!
However you implement OAuth, you’re probably going to need to be able to persist information to disk or database. As documented here several weeks ago, we already have that covered with our bm_cfg module. In ~/.cfg/fireeagle.json, create the following JSON format file:
{
"fireeagle": {
"api_uri" : "https://fireeagle.yahooapis.com/",
"oauth_access_token_url": "https://fireeagle.yahooapis.com/oauth/access_token",
"oauth_authorization_url": "http://fireeagle.yahoo.net/oauth/authorize",
"oauth_consumer_key": "ABCDEFGHIJKL",
"oauth_consumer_secret": "ABCDEFGHIJKLMNOPQRSTUVWXYZ012345",
"oauth_token_url": "https://fireeagle.yahooapis.com/oauth/request_token",
}
}
The only new item here is the api_uri: that’s the prefix of URLs that bm_uri will use OAuth with.
Next you have to do all sorts of OAuth stuff to actually work with OAuth. If the why interests you, please go read the spec! I’m more of how person myself, and this is what we need to do:
python bm_uri.py --service fireeagle --authorizepython bm_uri.py --service fireeagle --exchangeAnd that’s it – you should now be able to just work with the Fire Eagle API in bm_uri without even having to know OAuth is there!
Happy New Year, everyone. I’ve been busy at paying work recently, plus cleaning up and testing existing code I’ve been discussing here over the last few months. At work I’ve been developing in WebObjects, which though a lovely platform is not the way of the future so I’m not documenting many of my experiences here.
The applications I’ve been working on recently, Pipe Cleaner and GenX, need – like most applications – configuration. This will store information which can be safely exposed to the public, such as my Google Maps API key, and information that I need to keep private within the application, such as my Freebase username and password (cf. however the password anti-pattern). Furthermore, though the code I’m writing is in Python it is possible that the code that provides the UI will be written in another language, such as PHP inside of WordPress.
Given these considerations, here’s my design choices:
That all said, here’s what I’ve written. First, the setters and getters:
class Cfg:
_cfg_private = {}
_cfg_public = {}
@apply
def public():
def fget(self):
return self._cfg_public
return property(**locals())
@apply
def private():
def fget(self):
return self._cfg_private
return property(**locals())
As an aside, I’m not 100% sure about Python decorators and wonder if my favorite language is being turned into a C++ like mess.
Next, the ‘add’ function that adds information to the configuration ensuring private and public are handled correctly. Note that there can be multiple dictionaries inside of ‘d’, but ‘d’ is either all Public or not.
def add(self, d):
if type(d) != types.DictType:
raise TypeError("only dictionaries can be added")
if d.get('@Public'):
#
# Public definitions never overwrite private definitions
#
for key, value in d.iteritems():
if type(value) != types.DictType:
continue
if not self._cfg_private.has_key(key):
self._cfg_private[key] = value
self._cfg_public[key] = value
else:
self._cfg_private.update(d)
And finally the loader, which gets everything in a directory or one level down. Note the ‘exception’ parameter which makes me a bad person, but I don’t like code failing unless I tell it to.
def load(self, path, exception = False, depth = 0):
try:
if os.path.isdir(path) and depth < 2:
for file in os.listdir(path):
self.load(os.path.join(path, file))
elif os.path.isfile(path):
if path.endswith(".json"):
self.add(json.loads(bm_io.readfile(path)))
except:
if exception:
raise
Log("ignoring exception", exception = True, path = path)
And one more thing: make the global configuation:
cfg = Cfg()
Here’s how you use it:
import bm_cfg
# setup ... on a per-file or directory basis
for file in sys.argv[1:]:
bm_cfg.cfg.load(file)
# use it
pprint.pprint({
"private" : bm_cfg.cfg.private,
"public" : bm_cfg.cfg.public,
}, width = 1)
Here’s what my configuration directory looks like:
$ pwd /Users/davidjanes/Sites/pc/cfg $ ls amazon.json freebase.json praized.json amazon.public.json gmaps.json yahoo.json
Here’s the (private) amazon.json:
{
"amazon" : {
"Locale" : "us",
"AccessKeyID" : "0......",
"AssociateTag" : "ona-20",
"Private" : "Don't See"
}
}
And here’s the (public) amazon.public.json:
{
"@Public" : 1,
"amazon" : {
"Locale" : "us",
"AccessKeyID" : "0......",
"AssociateTag" : "ona-20"
}
}
Note that if the private version of the Amazon file wasn’t available, the public version would also be in the private one. I.e. the private configuration basically is “everything” (noting possibly exceptions above in the code).
In the previous entry, we talked about the difficultly in finding out the delta from UTC for a timezone returned from the pytz module. In particular, consider the offset for St. John’s, Newfoundland which should be at -3:30.
dt_now = datetime.datetime.now()
tz = pytz.timezone('America/St_Johns')
offset = tz.utcoffset(dt_now)
Log(
"using datetime.utcoffset",
offset = format(offset),
)
With the unexpected result:
message: using datetime.utcoffset offset: -4:29 (-12660)
I did a fair bit of Google searching for an answer without finding a satisfactory result, so I did further research on my own. To find the correct offset value, I found that this works:
dt_sj = tz.localize(dt_now)
offset = dt_sj - pytz.UTC.localize(dt_now)
Log(
"using delta to UTC",
offset = format(offset),
)
Which yields the correct:
message: using delta to UTC offset: 03:30 (12600)
Note that if you’re going to use the above method for finding deltas, you’re going to have to take Daylight Savings Time into consideration also. I have not done this here, as I’m a little pressed for time and just want to illustrate the problem.
The issue seems to be with the way that pytz uses the Olson database entry (from here) for St. John’s – and all other locations. It appears that pytz is using the first rule it sees, from 1884, rather than the rule for the date that was passed in. I think this is a bug.
#
# St John's has an apostrophe, but Posix file names can't have apostrophes.
# Zone NAME GMTOFF RULES FORMAT [UNTIL]
Zone America/St_Johns -3:30:52 - LMT 1884
-3:30:52 StJohns N%sT 1918
-3:30:52 Canada N%sT 1919
-3:30:52 StJohns N%sT 1935 Mar 30
-3:30 StJohns N%sT 1942 May 11
-3:30 Canada N%sT 1946
-3:30 StJohns N%sT
The setup code for the examples above is:
from bm_log import Log
import dateutil.parser
import pytz
import datetime
def format(td):
seconds = td.seconds + td.days * ( 24 * 3600 )
return "%02d:%02d (%s)" % ( seconds // 3600, seconds % 3600 // 60, seconds, )
Update 2010-03-09: This has been fixed in the code base and (presumably) will be in the next upcoming release.
Here’s a few examples of working with dates, times and timezones in Python. We are using the following packages:
There’s a lot of complexity to working with datetimes in any language; I’m not going to get into that but would prefer instead to show a few practical examples. Keep the following in mind:
Our standard imports. Log is from the pybm library and it’s purpose is rather obvious.
from bm_log import Log import dateutil.parser import pytz import datetime
Here’s an example of parsing the an e-mail or RSS type date using dateutil.
dts = "Thu, 13 Nov 2008 05:41:35 +0000"
dt = dateutil.parser.parse(dts)
Log(
"Parsing an RFC type date",
src = dts,
dt = dt,
iso = dt.isoformat(),
)
message: Parsing an RFC type date dt: 2008-11-13 05:41:35+00:00 iso: 2008-11-13T05:41:35+00:00 src: Thu, 13 Nov 2008 05:41:35 +0000
Here’s an example of parsing an ISO Datetime
dts = '2008-11-13T05:41:35-0400'
dt = dateutil.parser.parse(dts)
Log(
"Parsing an ISO Date with Timezone",
src = dts,
dt = dt,
iso = dt.isoformat(),
)
message: Parsing an ISO Date with Timezone dt: 2008-11-13 05:41:35-04:00 iso: 2008-11-13T05:41:35-04:00 src: 2008-11-13T05:41:35-0400
Here’s an example of parsing a naive timezone.
dts = '2008-11-13T05:41:35'
dt = dateutil.parser.parse(dts)
Log(
"Parsing an ISO Date without a Timezone",
src = dts,
dt = dt,
iso = dt.isoformat(),
)
message: Parsing an ISO Date without a Timezone dt: 2008-11-13 05:41:35 iso: 2008-11-13T05:41:35 src: 2008-11-13T05:41:35
Here’s are two similar example, showing how to force the timezone if it’s not present. This will happen in the first part, but not the second.
tz = pytz.timezone('America/Toronto')
dts = '2008-11-13T05:41:35'
dt = dateutil.parser.parse(dts)
if dt.tzinfo == None:
dt = dt.replace(tzinfo = tz)
Log(
"Parsing an ISO Date without a Timezone BUT specifying default TZ",
src = dts,
dt = dt,
iso = dt.isoformat(),
tz = tz,
)
tz = pytz.timezone('America/Toronto')
dts = '2008-11-13T05:41:35-0400'
dt = dateutil.parser.parse(dts)
if dt.tzinfo == None:
dt = dt.replace(tzinfo = tz)
Log(
"Parsing an ISO Date with a Timezone AND specifying default TZ",
src = dts,
dt = dt,
iso = dt.isoformat(),
tz = tz,
)
message: Parsing an ISO Date without a Timezone BUT specifying default TZ dt: 2008-11-13 05:41:35-05:00 iso: 2008-11-13T05:41:35-05:00 src: 2008-11-13T05:41:35 tz: America/Toronto message: Parsing an ISO Date with a Timezone AND specifying default TZ dt: 2008-11-13 05:41:35-04:00 iso: 2008-11-13T05:41:35-04:00 src: 2008-11-13T05:41:35-0400 tz: America/Toronto
Update: here’s an example of moving datetimes to UTC and then to a different Timezone. Remember: you want your backend code to work with UTC datetimes for simplicity and correctness:
dts = '2008-11-13T05:41:35-0400'
dt_orig = dateutil.parser.parse(dts)
dt_utc = dt.astimezone(pytz.UTC)
Log(
"Changing a datetime to UTC",
src = dts,
dt_orig = dt_orig,
dt_utc = dt_utc,
)
tz_vancouver = pytz.timezone('America/Vancouver')
dt_vancouver = dt_utc.astimezone(tz_vancouver)
Log(
"Changing UTC datetime to a different timezone",
dt_vancouver = dt_vancouver,
dt_utc = dt_utc,
)
message: Changing a datetime to UTC dt_orig: 2008-11-13 05:41:35-04:00 dt_utc: 2008-11-13 09:41:35+00:00 src: 2008-11-13T05:41:35-0400 message: Changing UTC datetime to a different timezone dt_utc: 2008-11-13 09:41:35+00:00 dt_vancouver: 2008-11-13 01:41:35-08:00
Here is an example of listing all “common” timezones using pytz. Note that “America” refers to the two continents, not the Irish word for the United States. Printing the actual timezone offset turned out to be a surprisingly complex task, which I will outline in a different blog post. For now let it suffice that with pytz try not to depend on utcoffset.
dt_now = datetime.datetime.now()
def tzname2offset(tzname):
dt_in_utc = pytz.UTC.localize(dt_now)
dt_in_tz = pytz.timezone(tzname).localize(dt_now)
offset = dt_in_utc - dt_in_tz
seconds = offset.seconds + offset.days * ( 24 * 3600 )
return "%02d:%02d" % ( seconds // 3600, seconds % 3600 // 60, )
Log(
"Olsen (pytz) common timezones and their UTC offsets",
timezones = map(
lambda tzname: ( tzname, tzname2offset(tzname), ),
pytz.common_timezones,
)
)
message: Olsen (pytz) common timezones and their UTC offsets
timezones:
[('Africa/Abidjan', '00:00'),
('Africa/Accra', '00:00'),
('Africa/Addis_Ababa', '03:00'),
('Africa/Algiers', '01:00'),
('Africa/Asmara', '03:00'),
...
('Pacific/Wake', '12:00'),
('Pacific/Wallis', '12:00'),
('US/Alaska', '-9:00'),
('US/Arizona', '-7:00'),
('US/Central', '-6:00'),
('US/Eastern', '-5:00'),
('US/Hawaii', '-10:00'),
('US/Mountain', '-7:00'),
('US/Pacific', '-8:00'),
('UTC', '00:00')]
I haven’t been posting as much as I like here for the last three weeks, not because of lack of ideas but because I haven’t been able to consolidate what I’ve been working on into a coherent thought. I’m trying to come up with a overreaching conceptual arch that covers WORK, Djolt and the various API interfaces I’ve been coded. Tentatively and horribly, I’m calling this Data/Query/Transform/Template right now though I’m expecting this to change.
The first demo of this … without further explanation … can be seen here. More details about what this is actually demonstrating (besides formatting this blog) will be forthcoming.
What I want to draw attention to in this post is how I coded this. What I’ve been doing for the last several weeks is coding backwards: I start with what I want the final code to look like and then figure out all the libraries, little languages and so forth that would be needed to code that. After several false starts, my conceptual logjam broke about a week ago and code started radically simplifying.
The ideal code, in my mind, is almost entirely static declarations: no loops, no if statements, no while statements, no goto-type statements (god help us). We simply specify how the parts are connected, and hope that we can abstract the complexity into the libraries that make this all happen. The code that you see below is actually post all my conceptualizing: I just wanted to write some code and since I had almost all the parts together it fell together quite nicely:
import bm_wsgi
import bm_io
import djolt
import api_feed
from bm_log import Log
class Application(bm_wsgi.SimpleWrapper):
def __init__(self, *av, **ad):
bm_wsgi.SimpleWrapper.__init__(self, *av, **ad)
def CustomizeSetup(self):
self.html_template_src = bm_io.readfile("index.dj")
self.html_template = djolt.Template(self.html_template_src)
self.context = djolt.Context()
self.context["paramd"] = {
"feed" : "http://feeds.feedburner.com/DavidJanesCode",
"template" : """\
<ul>
{% for item in data.items %}
<li><a href="{{ item.link }}">{{ item.title }}</a></li>
{% endfor %}
""",
}
self.context.Push()
self.context["paramd"] = self.paramd
self.context["data"] = api_feed.RSS20(self.context.as_string("paramd.feed"))
def CustomizeContent(self):
yield self.html_template.Render(self.context)
if __name__ == '__main__':
Application.RunCGI()
There’s almost nothing there! In particular, note:
bm_wsgi.SimpleWrapper handles all the WSGI interface work, including determining when to output HTML headers, error trapping, and Unicode to UTF-8 encodingContext. In particular, note that self.paramd is automatically populated by the QUERY_STRING passed to the application, and the double setting we do here allows us to have default values.{% asis %} block which doesn’t intrepret it’s content as Djolt code and the {{ *paramd.template|safe }} variable which interprets the variable’s contents as a template.Customize-something are my convention for framework functions, i.e. methods that will be called for us rather than methods we call.As part of my recent explorations, I’ve been playing a lot with Python iterators/generators. The key efficiency of iterators is that when working with lengthy list-like objects, you need only create the part that’s being looked at. It’s just-in-time objects.
If you attempt to JSON serialize an object with an iterator/generator object in it, the json module throws a cog: it doesn’t know how to serialize these types of objects. The json module is extensible and the documentation makes a suggestion how to do this:
class IterEncoder(json.JSONEncoder):
def default(self, o):
try:
iterable = iter(o)
except TypeError:
pass
else:
return list(iterable)
return JSONEncoder.default(self, o)
print json.dumps(xrange(4), cls = IterEncoder)
This seems somewhat ugly to me. In particular, lots of objects can be wrapped by the iter function that don’t need to be, plus lots of objects will cause that TypeError to be thrown which seems to be rather a bit of waste. Here’s the solution I came up with:
class IterEncoder(json.JSONEncoder):
def default(self, o):
try:
return json.JSONEncoder.default(self, o)
except TypeError, x:
try:
return list(o)
except:
return x
This tries to encode the object the normal way. Only if that doesn’t work do we try to turn the object into a list. If that’s not convertible (i.e. the list object constructor fails) we go back and throw the original exception provided by JSONEncoder – we’ve really failed.
You use this as follows:
class X:
def Iter(self):
yield 1
yield 2
yield 3
yield 4
xi = X().Iter()
print json.dumps(xi, cls = IterEncoder)
print json.dumps(xrange(4), cls = IterEncoder)
Which yields the expected:
[1, 2, 3, 4] [0, 1, 2, 3]
Don’t be overly tempted to check the type of o: it may be types.GeneratorType or types.XRangeType or perhaps even something else that I haven’t found out yet.
I’ve been working through a sticky problem with Djolt, trying to implement my Toronto Fires example in as few lines as possible. As part of this, I’ve come up with the idea of adding indirection to Djolt templates:
import djolt
d = {
"a" : "It says: {{ b }}",
"b" : "Hello, World"
}
t = djolt.Template("""
a: {{ a }}
b: {{ b }}
*a: {{ *a }}
""")
print t.Render(d)
""")
print t.Render(d)
Which yields:
a: It says: {{ b }}
b: Hello, World
*a: It says: Hello, World
This is significantly updated from the original version I posted here an hour ago. The indirection now makes the variable read as a template. This is a much more powerful concept.
Djolt is a reimplementation of Django’s template language in Python. Why do this?
However, if you’re really looking for the whole Django template experience and don’t want to use Djolt, just start here.
Djolt is packaged as part of the pybm library.
import djolt
t = djolt.Template("""
<ul>
{% for name in names %}
<li>{{ name }}</li>
{% endfor %}
</ul>
""")
print t.Render({
"names" : [ "Johnny", "Jack", "Ray", "Mary & Sam", ]
})
Which gives the results:
<ul> <li>Johnny</li> <li>Jack</li> <li>Ray</li> <li>Mary & Sam</li> </ul>
Note the “autoescaping” of the & character.
It does not implement blocks.
Unimplemented filters are due to laziness and will be done “on demand”. We also introduce a few new filters:
Beyond that you should be able to use most Django template examples (that don’t use block/implements) as-is.
Yes. You can add your own tags and filters by following the examples in code (djolt_nodes.py and djolt_filters.py respectively).
The normal way to load Python code is through the import statement:
import pprint
pprint.pprint('Hello, world.')
But what do you do if you want to dynamically load a module? A classic example of where you’d like to do this is adding ‘extensions’ to your application. Your application has no way of knowing the exact name of the module that it’s going to use; it only knows the filename(s). The way to do this is the imp module:
import md5
import os.path
import imp
import traceback
def load_module(code_path):
try:
try:
code_dir = os.path.dirname(code_path)
code_file = os.path.basename(code_path)
fin = open(code_path, 'rb')
return imp.load_source(md5.new(code_path).hexdigest(), code_path, fin)
finally:
try: fin.close()
except: pass
except ImportError, x:
traceback.print_exc(file = sys.stderr)
raise
except:
traceback.print_exc(file = sys.stderr)
raise
A few notes:
load_module with the path to a .py file that you want to loadmd5.new generates a unique module identifier. If you don’t do this it’s difficult to import two modules in different directories with the same name!excepts are to give you a flavor of the issues you may see, ImportError is expected, the others are notThe return value is a module, which is a Python object that you can address in all the normal ways that you’d use a module. For example, if you have the following file extension.py:
def hello(x): print "Hello, %s" % x
You can use it as follows to get Hello, world.
m = load_module('extension.py')
m.hello("World")
Here’s a Python code pattern that I find myself falling into every once in awhile. If you’re a highly disciplined milspec-type non-pragmatic programmer, I suggest you stop reading here lest you burn your eyes.
The patterm useful in two situations:
class Component:
def __init__(self, a, b = None, *av, **ad):
...
class ComponentTemplate(Component):
def __init__(self, *av, **ad):
Component.__init__(self, *av, **ad)
a and b are two arguments that are being used by superclass. With this pattern you can add c to Component in the future without worrying about rewriting ComponentTemplate. Similarly, if an unexpected argument is passed down to Component, it will be silently ignored.
In case you’re wondering what *av and **ad are, they’re Python’s way of referring to arguments that have been passed in, by position and by name, but have not been explicitly listed in the method’s signature. The first is a list and the second a dictionary. If you’re a Python user and you’re not familiar with this, you can and should read more about this here.
Here’s a few things I was reading about over the weekend.
SQLAlchemy is a full-featured Design Pattern-heavy pythonic database ORM. I am totally going to use this for my next Python SQL database project and may even do some playing with old datasets (using the reflection features, yum) soon. If you are considering doing SQL work on your next Python project, don’t even bother with the usual PEP 249 stuff, start with this.
Note that if you’re working with Django it handles the DB in its own way so SQLAlchemy may be of limited utility.
CouchDB “is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API”. I couldn’t have written that more succently myself, so I didn’t. I qualified the paragraph above on SQLAlchemy that I’m going to use that for my next SQL project because I’m really biting at the bit to try CouchDB out. The CouchDB design philosophy – a REST API a returning lists of JSON-objects – reflects my current design paradigm very closely, and the only question I have is whether in practically scales to millions of rows.
A caveat that it’s written in the-cool-nerds-are-doing-it language Erlang, but because you don’t have to interact with that it should be OK for us mortals.
CouchDB is about to officially become a “top level” Apache project, though none of the documentation on the Apache.org site reflects this yet.
Virtuoso is a “high-performance object-relational SQL database”. It apparently can perform well. As I came across through the Planet RDF aggregator, this may be something you want to look into if you’re working on an RDF/SPARQL project.
That’s a mouthfull, isn’t it? Amazon is offering to host public datasets on EC2 for free. What’s the catch? It will host the data, but you have to pay for the computing resources to use that data in the normal EC2 manner. Still, if you’re using a large public dataset and you’re already EC2-friendly, you might want to consider this program. An even more interesting thought occurs (though I’m not sure if it will fly): if you’re using large amounts of your own data on EC2, you may want to offer it up as a free resource.
There’s more on this on by Lidija Davis on Read/Write Web.
When developing Python code there’s a tendency to do add a __main__ section to test the code:
def add(a, b):
return a + b
if __name__ == '__main__':
print add(3, 4)
Don’t. Python has a great little package called unittest that let’s you quickly frame functions in testcases.
If the example above is called add.py, I’ll generally make a subdirectory called tests and add a test program called test_add.py. This can be as simple as:
import unittest
class TestAdd(unittest.TestCase):
def setUp(self):
pass
def test_1(self):
self.assertEqual(add(3, 4), 7)
self.assertEqual(add(4, 4), 8)
self.assertEqual(add(4, -4), 0)
if __name__ == '__main__':
unittest.main()
But I prefer to use the following pattern:
class TestAdd(unittest.TestCase):
def test_add(self):
checkds = [
{
"a" : 4,
"b" : 3,
"@result": 7
},
{
"a" : 4,
"b" : 4,
"@result": 8
},
{
"a" : 4,
"b" : -4,
"@result": 0
},
]
for checkd in checkds:
expected_result = checkd.pop("@result")
actual_result = add(**checkd)
if expected_result == -1:
print checkd, actual_result
continue
try:
self.assertEqual(expected_result, actual_result)
except:
print checkd, actual_result
raise
In particular:
for loop) is boilerplate
@result from the dictionaryadd with the remaining dictionaryactual_result was the same as the expected_resultexpected_result is -1, it doesn’t run the test, it just prints the actual_result. This is great for setting up your tests in the first place. Obviously you might way to change this marker for testing functions that can return -1, but you get the ideaThe advantage of using unittest like is that you’re now not depending on visual inspection or remembering which files you put a __main__ in to test your code. As a secondary benefit, unittest helps you think about edge cases, how other people might call your code.
Just go to your test directory and run them all and you’ll be sure your libraries are behaving as designed.
Here’s an example of implementing an API with many different endpoints. It’s the Google AJAX Search API which lets you access all of Google’s search engines programmatically! A few notes:
_item_path, which describes how to pull WORK result objects out of the AJAX result_http_referer: the URL of the site that’s using the resultsapi_key. You can use the same API key that you’ve created for Google Maps.Here’s the Google API class: quite simple. I’ll probably extend each individual search function to provide all the known parameters by name, rather than passing in a **ad catch-all.
class Google(bm_api.API):
_base_query = {
"v" : "1.0",
}
_item_path = "responseData.results"
_meta_path = "responseData.cursor"
_convert2work = bm_work.JSON2WORK()
def __init__(self, _http_referer, **ad):
bm_api.API.__init__(self, _http_referer = _http_referer, **ad)
def WebSearch(self, q, **ad):
self._uri_base = "http://ajax.googleapis.com/ajax/services/search/web"
self.SearchOn(q = q, **ad)
def LocalSearch(self, q, **ad):
self._uri_base = "http://ajax.googleapis.com/ajax/services/search/local"
self.SearchOn(q = q, **ad)
def VideoSearch(self, q, **ad):
self._uri_base = "http://ajax.googleapis.com/ajax/services/search/video"
self.SearchOn(q = q, **ad)
def BlogSearch(self, q, **ad):
self._uri_base = "http://ajax.googleapis.com/ajax/services/search/blogs"
self.SearchOn(q = q, **ad)
def NewsSearch(self, q, **ad):
self._uri_base = "http://ajax.googleapis.com/ajax/services/search/news"
self.SearchOn(q = q, **ad)
def BookSearch(self, q, **ad):
self._uri_base = "http://ajax.googleapis.com/ajax/services/search/books"
self.SearchOn(q = q, **ad)
def ImageSearch(self, q, **ad):
self._uri_base = "http://ajax.googleapis.com/ajax/services/search/images"
self.SearchOn(q = q, **ad)
def PatentSearch(self, q, **ad):
self._uri_base = "http://ajax.googleapis.com/ajax/services/search/patentNew"
self.SearchOn(q = q, **ad)
Here’s how you use it:
api_key = os.environ["GMAPS_APIKEY"]
referer = "http://code.davidjanes.com"
query = "Paris Hilton"
api = Google(key = api_key, _http_referer = referer)
api.VideoSearch(query)
for item in api.IterItems():
pprint.pprint(item)
Here’s an example of a results, searching for “Paris Hilton” in Videos. I tried searching in Patents without luck.
{'@Index': 0,
'@Page': 1,
u'GsearchResultClass': u'GvideoSearch',
u'content': u"Paris Hilton's new video clip for 'Nothing In This World'",
u'duration': u'204',
u'playUrl': u'http://www.youtube.com/v/...',
u'published': u'Thu, 12 Oct 2006 09:33:23 PDT',
u'publisher': u'www.youtube.com',
u'rating': u'4.52872',
u'tbHeight': u'240',
u'tbUrl': u'http://0.gvt0.com/vi/Ki2M3-2W-cQ/0.jpg',
u'tbWidth': u'320',
u'title': u'Paris Hilton - Nothing In This World',
u'titleNoFormatting': u'Paris Hilton - Nothing In This World',
u'url': u'http://www.google.com/url?q=...',
u'videoType': u'YouTube'}
Implementing a merchant search using the Praized API took about 10 minutes (mainly finding the right documentation), using my WORK framework:
class PraizedMerchants(bm_api.API):
"""See: http://code.google.com/p/praized/wiki/A_Second_Tutorial_Search"""
_uri_base = "http://api.praized.com/apitribe/merchants.xml"
_meta_path = "community"
_item_path = "merchants.merchant"
_page_max_path = 'pagination.page_count'
_page_max = -1
def __init__(self, api_key, slug = "apitribe", **ad):
bm_api.API.__init__(self, api_key = api_key, **ad)
self._uri_base = "http://api.praized.com/%s/merchants.xml" % slug
def CustomizePageURI(self, page_index):
if page_index > 1:
return "page=%s" % page_index
Partially hardcoding ‘apitribe’ as a ‘community slug’ is probably a bad idea. Anyhoo, here’s how you call it…
api_key = os.environ["PRAIZED_APIKEY"]
api = PraizedMerchants(api_key = api_key, slug = "david-janess-code")
api.SearchOn(
q = "Bistro",
l = "Toronto",
)
for item in api.IterItems():
print json.dumps(item, indent = 1)
… and a set if results, somewhat edited below. I’ll have to figure out what that “permalink” is all about (I’ve edited it to shorten it) … it could be something neat, but I haven’t quite grasped all the ins and outs of what Praized wants to accomplish as a business.
{
"@Index": 0,
"@Page": 1,
"short_url": "http://przd.com/zAU-7",
"pid": "af5bebd604f3d1517a8113e0a2e8cc58",
"updated_at": "2008-10-04T20:49:34Z",
"phone": "(416) 585-7896",
"permalink":
".../praized/places/ca/ontario/toronto/coffee-supreme-bistro?l=Toronto&q=Bistro",
"name": "Coffee Supreme Bistro",
"created_at": "2008-10-04T20:49:34Z",
"location": {
"city": {
"name": "Toronto"
},
"country": {
"code": "CA",
"name_fr": "Canada",
"name": "Canada"
},
"longitude": "-79.384071",
"regions": {
"province": "Ontario"
},
"postal_code": "M5J 1T1",
"latitude": "43.646347",
"street_address": "40 University Avenue"
}
}
Powered by WordPress