<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>David Janes&#039; Code Weblog &#187; semantic web</title>
	<atom:link href="http://code.davidjanes.com/blog/category/semantic-web/feed/" rel="self" type="application/rss+xml" />
	<link>http://code.davidjanes.com/blog</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Sun, 11 Apr 2010 12:32:10 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>hAtom hits the big time</title>
		<link>http://code.davidjanes.com/blog/2009/10/22/hatom-hits-the-big-time/</link>
		<comments>http://code.davidjanes.com/blog/2009/10/22/hatom-hits-the-big-time/#comments</comments>
		<pubDate>Thu, 22 Oct 2009 19:14:39 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[microformats]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=585</guid>
		<description><![CDATA[From the Read/Write Web:
Earlier this year, the Associated Press, together with the Media Standards Trust, introduced hNews, a new microformat for describing news content. HNews allows publishers to easily attach machine-readable news semantics to content on the web. Today, the AP announced the completion of the first draft of hNews. In addition, TownNews, announced that [...]]]></description>
			<content:encoded><![CDATA[<p>From the <a href="http://www.readwriteweb.com/archives/ap_hnews_first_draft_adopted_by_townnews.php">Read/Write Web</a>:</p>
<blockquote><p>Earlier this year, the <a href="http://www.ap.org/">Associated Press</a>, together with the <a href="http://www.mediastandardstrust.org/home.aspx">Media Standards Trust</a>, introduced <a href="http://microformats.org/wiki/hNews">hNews</a>, a new microformat for describing news content. HNews allows publishers to easily attach machine-readable news semantics to content on the web. Today, the AP announced the completion of the first draft of hNews. In addition, <a href="http://townnews.com/">TownNews</a>, <a href="http://townnews.com/articles/2009/10/20/press_release/doc4adc780ca9a49642888412.txt">announced</a> that is will support hNews in its <a href="http://townnews.com/solutions/blox_cms/">BLOX content management system</a>, which is being used by over 1,500 newspapers in the US.</p>
<p>HNews, which is an extension of the hAtom format, only requires content users to specify information about the source organization. In addition, publishers can specify <a href="http://microformats.org/wiki/geo">geo-information</a>, a dateline element, <a href="http://microformats.org/wiki/licensing-brainstorming#item_as_container">license information</a> and <a href="http://microformats.org/wiki/principles-brainstorming#rel-principles_specification">information</a> about the code of ethics that governed the behavior of the author of a given site. At its most basic level, hNews, just like other microformats like hCard or hCalendar, allows search engines spiders to identify and read semantic information that would otherwise be buried within a text and would be hard to identify for search engines.</p></blockquote>
<p>The RRW article then goes on to posit some ideas about this being related to AP&#8217;s efforts to track use of their web content across the web. This strikes me as rather farfetched, as stripping out the microformat tags is beyond trivial. What makes this exciting for me is that it makes it more likely that search engines will start recognizing <a href="http://microformats.org/wiki/hatom">hAtom</a> tags and thus will start properly indexing blogs and other microcontent properly into search engines.</p>
<p>In other exciting hAtom-related news, WordPress 2.7 has the <code><a href="http://codex.wordpress.org/Migrating_Plugins_and_Themes_to_2.7#Post_Classes">post_class</a></code> function to allow (new) templates to automatically include the <code>hentry</code> tag on blog posts! Also see the <a href="http://www.smashingmagazine.com/2009/10/20/10-useful-wordpress-hacks-for-advanced-themes/">Smashing Magazine article on this</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2009/10/22/hatom-hits-the-big-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Travel Websites &amp; Web 3.0</title>
		<link>http://code.davidjanes.com/blog/2009/08/10/travel-websites-web-3-0/</link>
		<comments>http://code.davidjanes.com/blog/2009/08/10/travel-websites-web-3-0/#comments</comments>
		<pubDate>Mon, 10 Aug 2009 19:53:24 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[Discover Anywhere Mobile]]></category>
		<category><![CDATA[ideas]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=574</guid>
		<description><![CDATA[On my Discover Anywhere Mobile blog, I&#8217;ve posted a list of recommendations about how travel websites can use information to extend their reach.
]]></description>
			<content:encoded><![CDATA[<p>On my Discover Anywhere Mobile blog, <a href="http://www.discoveranywheremobile.com/blog/travel-websites-and-web-3-0/">I&#8217;ve posted a list of recommendations</a> about how travel websites can use information to extend their reach.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2009/08/10/travel-websites-web-3-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>AUAPI: encoding hCards in JSON</title>
		<link>http://code.davidjanes.com/blog/2009/03/02/auapi-encoding-hcards-in-json/</link>
		<comments>http://code.davidjanes.com/blog/2009/03/02/auapi-encoding-hcards-in-json/#comments</comments>
		<pubDate>Mon, 02 Mar 2009 14:15:38 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[auapi]]></category>
		<category><![CDATA[aumfp]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=509</guid>
		<description><![CDATA[The best model for describing people is the vCard standard, RFC 2425 and RFC 2426. The microformats community has adapted the vCard standard for serialization into HTML using hCard. In the Almost Universal API (AUAPI), people and organizations should almost always be described using a JSON-encoded hCard.
It is difficult to describe, without going into great [...]]]></description>
			<content:encoded><![CDATA[<p>The best model for describing people is the <a href="http://en.wikipedia.org/wiki/VCard">vCard</a> standard, <a href="http://tools.ietf.org/html/rfc2425">RFC 2425</a> and <a href="http://tools.ietf.org/html/rfc2426">RFC 2426</a>. The microformats community has adapted the vCard standard for serialization into HTML using <a href="http://microformats.org/wiki/hcard">hCard</a>. In the <a href="http://code.davidjanes.com/blog/2009/02/27/introducing-the-almost-universal-api/">Almost Universal API</a> (AUAPI), people and organizations should almost always be described using a JSON-encoded hCard.</p>
<p>It is difficult to describe, without going into great minutiae, what the difficulties are in transforming the hCard and vCard standards into a pleasant looking and more importantly an easy-to-use hierarchy: there are certainly a number of edge cases that one would have to deal with it! There&#8217;s certainly an argument for just encoding hCard/vCards as a straight vCard serialization &#8211; at least in terms of simplicity of encoding. The issue is that the end consumer (which I believe should be the strongest focus) really has to do the dirty work in grouping everything together themselves.</p>
<h4>Algorithm</h4>
<p>This algorithm is destructive to the data structure it works upon, so generally you&#8217;ll be make a copy first.</p>
<ul>
<li>note that though we reference to all upper, mixed case, camel case and so forth hCard attributes, all attributes are actually physically encoded in lower case with &#8220;-&#8221; separators</li>
<li>let the &#8220;groupers&#8221; be ADR, GEO, N, ORG, TEL. Groupers group together attributes that are related (such as FirstName and LastName)</li>
<li>let the &#8220;narrowers&#8221; be Home, Work, Parcel, Postal (and <em>no-narrower</em>). Narrowers assign a specific meaning to a value, i.e. this a <em>Work</em> phone number.</li>
<li>assume each value is described by a number of attributes, i.e. &#8220;416-515-5555&#8243; can be described by ( TEL, Work, Mobile )</li>
</ul>
<p>Then:</p>
<ul>
<li>for Narrower, then for each Grouper
<ul>
<li>create a dictionary &#8217;subd&#8217;</li>
<li>for each values that is described by the ( Narrower, Grouper )
<ul>
<li>for each remaining attribute (besides Narrower and Grouper), add to subd</li>
<li>if the value was fully described by ( Narrower, Grouper ), add to subd under the key &#8216;@&#8217;</li>
</ul>
</li>
<li>for key, value in subd
<ul>
<li>add to the final result</li>
<li>if narrower is not &#8216;no-narrower&#8217;, add &#8216;@narrower = narrower&#8217;</li>
</ul>
</li>
</ul>
<ul>
<li>add subd to the result under the key Grouper</li>
</ul>
</li>
<li>add all remaining values from the original hCard to the result, noting that
<ul>
<li>if the value is described by a Narrower, we encoded it as a dictionary with &#8216;@narrower = narrower&#8217;</li>
</ul>
</li>
</ul>
<p>Clear? Well, the examples below will help. We the &#8220;416-515-5555&#8243; above we would get:</p>
<pre>{
 "hcard:hcard" : {
  'tel' : {
   '@work' : 'work',
   'mobile' : '416-515-5555',
  }
 }
}</pre>
<h4>Code</h4>
<p>The source code for this algorithm is in the <a href="http://code.google.com/p/aump/">AUMFP</a> tree, in file <code>vcard.py</code> function <code>decompose</code> (<a href="http://code.google.com/p/aump/source/browse/trunk/vcard.py">see around line 1083</a>)</p>
<h4>Namespace</h4>
<p>All JSON encoded hCards are in the namespace <code>hcard:</code>. In the AUAPI serialization, this namespace should only be on the enclosing element, all children will be assumed to be in the namespace. I am currently using the URI <code>http://purl.org/uF/hCard/1.0/</code> for this namespace (<a href="http://code.davidjanes.com/blog/2009/03/01/auapi-json-to-xml-serialization/">when XML serializing</a>); this may change in the future.</p>
<h4>Example 1 &#8211; home phone number from whitepages.com</h4>
<pre>{
 'hcard:hcard': {'adr': {'country-name': u'United States',
                         'locality': u'Huntsville',
                         'postal-code': '35801-2908',
                         'region': 'Alabama',
                         'street-address': u'1114 Humes Avenue NE'},
                 'fn': u'Jack Smith',
                 'geo': {'latitude': 34.743763000000001,
                         'longitude': -86.572568000000004},
                 'n': {'family-name': u'Smith', 'given-name': u'Jack'},
                 'tel': {'voice': u'256-539-8788'}},
}</pre>
<h4>Example 2 &#8211; work phone number from whitepages.com</h4>
<pre>{ 'hcard:hcard': {'adr': {'country-name': u'United States',
                         'locality': u'Gurley',
                         'postal-code': '35748-8715',
                         'region': 'Alabama',
                         'street-address': u'148 Little Cove Road'},
                 'fn': u'Jack Smith',
                 'geo': {'latitude': 34.698258000000003,
                         'longitude': -86.383027999999996},
                 'n': {'family-name': u'Smith', 'given-name': u'Jack'},
                 'org': {'organization-name': u'Alldyne Powder Technoliges'},
                 'tel': {'@work': 'work', 'voice': u'256-776-1238'}},
}</pre>
<h4>Example 3 &#8211; hCard directly to JSON</h4>
<pre>{ 'hcard:hcard': {
                 'adr': {u'country-name': u'United States of America',
                         u'locality': u'San Francisco',
                         u'region': u'CA'},
                 u'fn': u'Tantek \xc7elik',
                 u'logo': u'icon-2007-128px.png',
                 'n': {'family-name': u'\xc7elik',
                       'given-name': u'Tantek'},
                 u'photo': u'http://tantek.com/icon-2007-128px.png',
                 u'url': u'http://feeds.technorati.com/contact/tantek.com/#hcard'},
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2009/03/02/auapi-encoding-hcards-in-json/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>What is the framework for public APIs?</title>
		<link>http://code.davidjanes.com/blog/2009/02/22/what-is-the-framework-for-public-apis/</link>
		<comments>http://code.davidjanes.com/blog/2009/02/22/what-is-the-framework-for-public-apis/#comments</comments>
		<pubDate>Sun, 22 Feb 2009 20:55:23 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[ideas]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=486</guid>
		<description><![CDATA[This post was originally sent to the ChangeCamp mailing list in response to a question about &#8220;what framework should we use for public APIs?&#8220;.
The core &#8220;frameworks&#8221; are POSH, REST and JSON. POSH is &#8220;Plain Old Semantic HTML&#8221;, meaning websites should be developed using modern web standards, pages should validate and use HTML elements correctly, and [...]]]></description>
			<content:encoded><![CDATA[<p><em>This post was originally sent to the ChangeCamp mailing list in response to a question about &#8220;<a href="http://groups.google.com/group/changecamp/browse_thread/thread/e94d677c6be70e0d">what framework should we use for public APIs?</a>&#8220;.</em></p>
<p>The core &#8220;frameworks&#8221; are POSH, REST and JSON. <a href="http://en.wikipedia.org/wiki/Plain_Old_Semantic_HTML">POSH</a> is &#8220;Plain Old Semantic HTML&#8221;, meaning websites should be developed using modern web standards, pages should validate and use HTML elements correctly, and presentation is coded using CSS. <a href="http://en.wikipedia.org/wiki/REST">REST</a> can have deeper implications, but amongst the simplest is that pages can be returned using simple GET statements against well known URLs. <a href="http://www.json.org/">JSON</a> has emerged as the defacto standard for returning API results, amongst the reasons for is simplicity of creating mashups and embedability.</p>
<p><a href="http://en.wikipedia.org/wiki/Atom_(standard)">Atom</a> and/or <a href="http://en.wikipedia.org/wiki/RSS">RSS</a> provide the framework for update notifications. There are emerging technologies for real-time delivery, but it&#8217;s too early to worry about that.</p>
<p><a href="http://microformats.org/">Microformats</a> provide a framework for embedding well-understood objects in HTML, are based on popular and well-understood standards, are easy(-ish) to implement, and a &#8220;consumer&#8221; ecosystem exists. In particular, people can be represented by <a href="http://microformats.org/wiki/hCard">hCard</a>, events by <a href="http://microformats.org/wiki/hCalendar">hCalendar</a>, tagged data by<a href="http://microformats.org/wiki/rel-tag"> rel-tag</a> and microcontent (articles within a page) by <a href="http://microformats.org/wiki/hAtom">hAtom</a>. Note that no parallel infrastructure need exist to do microformats: they are served within HTML pages.</p>
<p>Identify should use <a href="http://oauth.net/">OAuth</a> and <a href="http://openid.net/">OpenID</a>; pragmatism says <a href="http://developers.facebook.com/connect.php">Facebook Connect</a> and <a href="http://www.google.com/friendconnect/">Google Friend Connect</a> should be in the mix too, though I have a number of reservations about those.</p>
<p>I am very non-bullish about <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a>, particularly as a model for delivering data of well-defined formats. IMHO it has missed almost the entirely the mashup wave of the last few years, and successes seem to be scattered at best. <a href="http://www.w3.org/TR/xhtml-rdfa-primer/">RDFa</a> is competing in microformat&#8217;s &#8220;space&#8221; and may see success yet if it starts proving concrete solutions rather than &#8220;here&#8217;s a format that can do anything&#8221;, especially given microformat&#8217;s process issues.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2009/02/22/what-is-the-framework-for-public-apis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Interesting links from the last month</title>
		<link>http://code.davidjanes.com/blog/2008/12/29/links-from-the-last-month/</link>
		<comments>http://code.davidjanes.com/blog/2008/12/29/links-from-the-last-month/#comments</comments>
		<pubDate>Mon, 29 Dec 2008 14:29:25 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[db]]></category>
		<category><![CDATA[ideas]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=401</guid>
		<description><![CDATA[
Aspen &#8211; a web server for highly extensible Python-based publication, application, and hybrid websites. As a potential alternative to Python&#8217;s builtin HTTPServer. MIT license.
V8 &#8211; V8 is Google&#8217;s open source JavaScript engine; written in C++; can run standalone, or can be embedded into any C++ application. I am very excited by this, as allowing users [...]]]></description>
			<content:encoded><![CDATA[<ul>
<li><a href="http://www.zetadev.com/software/aspen/0.8/doc/html/aspen.html">Aspen</a> &#8211; <em>a web server for highly extensible Python-based publication, application, and hybrid websites</em>. As a potential alternative to Python&#8217;s builtin <a href="http://www.python.org/doc/2.6/library/basehttpserver.html">HTTPServer</a>. MIT license.</li>
<li><a href="http://code.google.com/p/v8/">V8</a> &#8211; <em>V8 is Google&#8217;s open source JavaScript engine; written in C++; can run standalone, or can be embedded into any C++ application</em>. I am very excited by this, as allowing users to send code to the server to execute Javascript is an amazingly powerful idea. If anyone knows of a Python wrapper, let me know please. New BSD license.</li>
<li><a href="http://jjinux.blogspot.com/2008/12/editors-i-dig-komodo-edit.html">KomodoEdit</a> (a testimonial) &#8211; I am going to try this out, though vi/vim will always be my first love (JJ also has <a href="http://jjinux.blogspot.com/2008/12/vim-ctags.html">an article on using ctags</a>).</li>
<li><a href="http://virtuoso.openlinksw.com/">Virtuoso</a> -<em> an innovative Universal Server platform that delivers an enterprise level Data Integration and Management solution for SQL, RDF, XML, Web Services, and Business Processes</em>. There&#8217;s <em>way</em> to much bla bla bla in that sentence, but apparently this is really sweet at handling SPARQL/RDF triples. <a href="http://www.openlinksw.com/blog/kidehen@openlinksw.com/blog/">Kingsley Idehen</a> writes extensively about this on his blog (<a href="http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1489">e.g.</a>).</li>
<li><a href="https://launchpad.net/drizzle">Drizzle</a> &#8211; <em>a database optimized for Cloud and Net applications</em>. Way too early to commit to this yet. See <a href="http://jeremy.zawodny.com/blog/archives/010774.html">The New MySQL Landscape</a> for more interesting going ons.</li>
<li><a href="http://pypi.python.org/pypi/AuthKit/">AuthKit</a> &#8211; <em>authentication and authorization toolkit for WSGI applications and frameworks</em>.</li>
<li><a href="http://geodjango.org/">Geodjango</a> &#8211; <em>a world-class geographic web framework</em>. Lots of great ideas and pointers to libraries in here, even if you&#8217;re not planning to use this itself.</li>
<li>Disco &#8211; <em>an open-source implementation of the <a href="http://en.wikipedia.org/wiki/MapReduce">Map-Reduce</a> framework for distributed computing. The Disco core is written in Erlang, a functional language that is designed for building robust fault-tolerant distributed applications. Users of Disco typically write jobs in Python, which makes it possible to express even complex algorithms or data processing tasks often only in tens of lines of code</em>. <a href="http://ebiquity.umbc.edu/blogger/2008/12/21/disco-a-map-reduce-framework-in-python-and-erlang/">Here&#8217;s a blog post about the same</a>, with references to <a href="http://en.wikipedia.org/wiki/Hadoop">vs. Hadoop</a>.</li>
<li><a href="http://www.b-list.org/weblog/2008/dec/14/packaging/">On (Python) packaging</a>. Debating distutil, easy_install and pip.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2008/12/29/links-from-the-last-month/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A brief survey of Yahoo Pipes as a DQT</title>
		<link>http://code.davidjanes.com/blog/2008/12/11/brief-survey-of-yahoo-pipes/</link>
		<comments>http://code.davidjanes.com/blog/2008/12/11/brief-survey-of-yahoo-pipes/#comments</comments>
		<pubDate>Thu, 11 Dec 2008 12:19:55 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[demo]]></category>
		<category><![CDATA[djolt]]></category>
		<category><![CDATA[dqt]]></category>
		<category><![CDATA[ideas]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=325</guid>
		<description><![CDATA[Yahoo Pipes is a visual editor of mashups, allowing you to take data from sources on the net, transform them in various interesting ways and output the result as Atom, RSS or JSON. The primary downside Pipes of course is that you&#8217;re totally dependent on Yahoo for the infrastructure: it runs at Yahoo pulling feeds [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://code.davidjanes.com/examples/2008-12-13/pipes.png" alt="MacFUSE" style="border: 0 0 10px 10px; float: right;" /><a href="http://pipes.yahoo.com/">Yahoo Pipes</a> is a visual editor of mashups, allowing you to take data from sources on the net, transform them in various interesting ways and output the result as Atom, RSS or JSON. The primary downside Pipes of course is that you&#8217;re totally dependent on Yahoo for the infrastructure: it runs at Yahoo pulling feeds that have to be accessable through the public Internet.</p>
<p>It&#8217;s easy to use Pipes: just <a href="http://pipes.yahoo.com/pipes/docs?doc=overview">go to this page</a> and start working with the sample example Pipe. You&#8217;ll need a Yahoo login ID, but most of us have that anyway. I&#8217;ve created an example that uses Yahoo Pipes to feed a Djolt template <a href="http://code.davidjanes.com/examples/2008-12-11/dqt2/">which you can see here</a>.</p>
<p>We can analyze Pipes in the terms of <a href="http://code.davidjanes.com/blog/2008/12/09/introducing-dqt-dataquerytransformtemplate/">the DQT paradigm we&#8217;ve outlined in the previous post</a>.</p>
<h4>Data Sources and Queries</h4>
<p>Sources and Queries are merged (quite logically) in the Pipes interface. <a href="http://pipes.yahoo.com/pipes/docs?doc=sources">You can read in depth documentation here</a>.</p>
<ul>
<li>Fetch CSV</li>
<li>Feed Autodiscovery &#8211; outputs syndication feeds found on a page (<a href="http://pipes.yahoo.com/pipes/pipe.info?_id=jAZaeHvH3RGDaUk__w6H4A">RSS feeds on a CBC page</a>)</li>
<li>Fetch Feed</li>
<li>Fetch Page &#8211; will read a page and parse the contents with a reg</li>
<li>Fetch Site Feed &#8211; this is the logical combination of Fetch Feed and Fetch Autodiscovery</li>
<li>Flickr &#8211; find images by tag near a location (<a href="http://pipes.yahoo.com/pipes/pipe.info?_id=oPG38nvH3RG0QpVRrbQIDg">photos of cats in Toronto</a>)</li>
<li>Google Base &#8211; look up information in Google Base</li>
<li>Item Builder &#8211; a way of building new items from existing items</li>
<li>Yahoo Local</li>
<li>Yahoo Search</li>
</ul>
<h4>Transforms</h4>
<p>The <a href="http://pipes.yahoo.com/pipes/docs?doc=operators">operator documentation can be read here</a>.</p>
<ul>
<li>Count</li>
<li>Filter</li>
<li>Location Extractor &#8211; a geocoder that magically looks for locations</li>
<li>Loop</li>
<li>Regex</li>
<li>Rename</li>
<li>Reverse</li>
<li>Sort</li>
<li>Split</li>
<li>Sub-element &#8211; pulls a particular sub-element of an item and makes that the item. This is very much like WORK path manipulation</li>
<li>Tail</li>
<li>Truncate</li>
<li>Union</li>
<li>Unique</li>
<li>Web Service</li>
</ul>
<p>Plus a number of specialized data services, for dealing with elements such as dates.</p>
<h4>Templates</h4>
<p>Pipes does not provide an arbitrary Djolt-like template producing HTML. Instead, they provide a number of pre-made code templates that output well known data types, including RSS, JSON and Atom (and some stranger choices, like PHP).</p>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2008/12/11/brief-survey-of-yahoo-pipes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>All your Base are belong to us</title>
		<link>http://code.davidjanes.com/blog/2008/12/06/all-your-base/</link>
		<comments>http://code.davidjanes.com/blog/2008/12/06/all-your-base/#comments</comments>
		<pubDate>Sat, 06 Dec 2008 11:26:49 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[db]]></category>
		<category><![CDATA[freebase]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=295</guid>
		<description><![CDATA[Freebase is a user-editable, user-extensible structured database, a sort of one-stop shop semantic web/Wikipedia application. I started playing with Freebase about a year ago and the application has made significant strides over that period, especially in the usability department. Freebase also provides a very nice API which I&#8217;m using in GenX, with the caveat that [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.freebase.com/">Freebase</a> is a user-editable, user-extensible structured database, a sort of one-stop shop semantic web/Wikipedia application. I started playing with Freebase about a year ago and the application has made significant strides over that period, especially in the usability department. Freebase also provides a very nice API which I&#8217;m using in <a href="http://code.davidjanes.com/blog/category/genx/">GenX</a>, with the caveat that it&#8217;s <em>currently</em> almost useless because of query timeouts.</p>
<p>I just came across the following page on Freebase: <a href="http://vancouver.freebase.com/">http://vancouver.freebase.com/</a>. This page is what Freebase calls a Base, which is a collection of Tables/Views, which are things like &#8220;<a href="http://vancouver.freebase.com/view/base/vancouver/views/vancouver_neighbourhoods">Vancouver Bloggers</a>&#8220;, &#8220;<a href="http://vancouver.freebase.com/view/base/vancouver/views/mayoral_candidates_2008">Mayoral Candidates 2008</a>&#8221; and so forth. A Table/View is a list of Topics, which are basically the equivalent of a Wikipedia page. Get all that? It makes sense after a while</p>
<p>A few observations:</p>
<ul>
<li>Why have I written Table/View above? Because in some places it&#8217;s called a Table and other places it&#8217;s called a View. Which is it? I&#8217;m guessing View but it&#8217;s still not 100% clear.</li>
<li>I decided to create our own <a href="http://toronto.freebase.com/view/base/toronto/views/teams_of_toronto">Toronto Base</a> especially for the <a href="http://barcamp.org/TorCamp">TorCamp</a> community. Given that you get your own top-level domain name there&#8217;s somewhat of an incentive to be a first-mover on this</li>
<li>When you create a Base, it provides a list of suggested Views that can be added. Nice. Unfortunately, it added each View twice. I then had to go delete the duplicate View manually. Not so nice. And then <em>even though I&#8217;ve deleted the View </em>it still shows up on a <a href="http://toronto.freebase.com/domain/views/base/toronto">detail page</a>. Sigh.</li>
<li>On thus plus side, this is all done in a nice-Ajaxy way</li>
<li>It&#8217;s really not at all obvious how you create a new View. <strong>Really</strong> not obvious. <a href="http://www.freebase.com/view/guid/9202a8c04000641f8000000008744dbe">Here&#8217;s the documentation</a>.</li>
<li>My initial opinion was that Views seem to be copies, not references: this turns out to be a wrong assumption on my part. Views are in fact (if I got this right) the results of a query on the Freebase db. This means that as more Topics match the View query, they&#8217;ll automatically show up. The query is a copy, not a reference, but this is a good thing.</li>
<li>The implication is that it&#8217;s difficult to create a View that is an arbitrary &#8220;bag&#8221; of topics. For example, if I want to create a Toronto Bloggers View, I have to actually make sure that all the Topics that will show up are marked with some attribute that can be matched to give them a Toronto-bloggerness quality.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2008/12/06/all-your-base/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Database roundup</title>
		<link>http://code.davidjanes.com/blog/2008/11/24/database-roundup/</link>
		<comments>http://code.davidjanes.com/blog/2008/11/24/database-roundup/#comments</comments>
		<pubDate>Mon, 24 Nov 2008 12:27:44 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[db]]></category>
		<category><![CDATA[ideas]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=253</guid>
		<description><![CDATA[Here&#8217;s a few things I was reading about over the weekend.
SQLAlchemy
SQLAlchemy is a full-featured Design Pattern-heavy pythonic database ORM. I am totally going to use this for my next Python SQL database project and may even do some playing with old datasets (using the reflection features, yum) soon. If you are considering doing SQL work [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a few things I was reading about over the weekend.</p>
<h4>SQLAlchemy</h4>
<p><a href="http://www.sqlalchemy.org/">SQLAlchemy</a> is a full-featured <a href="http://en.wikipedia.org/wiki/Design_pattern_(computer_science)">Design Pattern</a>-heavy pythonic database <a href="http://en.wikipedia.org/wiki/Object-relational_mapping">ORM</a>. I am totally going to use this for my next Python SQL database project and may even do some playing with old datasets (using the <a href="http://www.sqlalchemy.org/docs/05/metadata.html#metadata_tables_reflecting">reflection features</a>, yum) soon. If you are considering doing SQL work on your next Python project, don&#8217;t even bother with the usual <a href="http://www.python.org/dev/peps/pep-0249/">PEP 249</a> stuff, start with this.</p>
<p>Note that if you&#8217;re working with <a href="http://www.djangoproject.com/">Django</a> it handles the DB in its own way so SQLAlchemy may be of limited utility.</p>
<h4>CouchDB</h4>
<p><a href="http://incubator.apache.org/couchdb/">CouchDB</a> &#8220;is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API&#8221;. I couldn&#8217;t have written that more succently myself, so I didn&#8217;t. I qualified the paragraph above on SQLAlchemy that I&#8217;m going to use that for my next <em>SQL</em> project because I&#8217;m really biting at the bit to try CouchDB out. The CouchDB design philosophy &#8211; a REST API a returning lists of JSON-objects &#8211; reflects my <a href="http://code.davidjanes.com/blog/category/work/">current design paradigm</a> very closely, and the only question I have is whether in practically scales to millions of rows.</p>
<p>A caveat that it&#8217;s written in the-cool-nerds-are-doing-it language <a href="http://www.erlang.org/">Erlang</a>, but because you don&#8217;t have to interact with that it should be OK for us mortals.</p>
<p>CouchDB is <a href="http://mail-archives.apache.org/mod_mbox/incubator-couchdb-dev/200811.mbox/%3C3F352A54-5FC8-4CB0-8A6B-7D3446F07462@jaguNET.com%3E">about to officially become a &#8220;top level&#8221; </a><a href="http://mail-archives.apache.org/mod_mbox/incubator-couchdb-dev/200811.mbox/%3C3F352A54-5FC8-4CB0-8A6B-7D3446F07462@jaguNET.com%3E">Apache</a><a href="http://mail-archives.apache.org/mod_mbox/incubator-couchdb-dev/200811.mbox/%3C3F352A54-5FC8-4CB0-8A6B-7D3446F07462@jaguNET.com%3E"> project</a>, though none of the documentation on the <a href="http://apache.org/">Apache.org</a> site reflects this yet.</p>
<h4>Virtuoso</h4>
<p><a href="http://virtuoso.openlinksw.com/wiki/main/Main/">Virtuoso</a> is a &#8220;high-performance object-relational SQL database&#8221;. <a href="http://www.openlinksw.com/weblog/oerling/?id=1484">It apparently can perform well</a>. As I came across through the <a href="http://planetrdf.com/">Planet RDF</a> aggregator, this may be something you want to look into if you&#8217;re working on an <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a>/<a href="http://www.w3.org/TR/rdf-sparql-query/">SPARQL</a> project.</p>
<h4>Amazon Web Services Hosted Data Sets</h4>
<p>That&#8217;s a mouthfull, isn&#8217;t it? <a href="http://aws.amazon.com/publicdatasets/">Amazon is offering to host public datasets</a> on <a href="http://aws.amazon.com/ec2/">EC2</a> for free. What&#8217;s the catch? It will host the data, but you have to pay for the computing resources to use that data in the normal EC2 manner. Still, if you&#8217;re using a large public dataset and you&#8217;re already EC2-friendly, you might want to consider this program. An even more interesting thought occurs (though I&#8217;m not sure if it will fly): if you&#8217;re using large amounts of your own data on EC2, you may want to offer it up as a free resource.</p>
<p>There&#8217;s more on this on by <a href="http://www.readwriteweb.com/archives/amazon_web_services_seeks_publ.php">Lidija Davis on Read/Write Web</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2008/11/24/database-roundup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Work API Teaser II &#8211; Praized API</title>
		<link>http://code.davidjanes.com/blog/2008/11/12/work-api-teaser-ii-praized-api/</link>
		<comments>http://code.davidjanes.com/blog/2008/11/12/work-api-teaser-ii-praized-api/#comments</comments>
		<pubDate>Wed, 12 Nov 2008 23:46:19 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[demo]]></category>
		<category><![CDATA[ideas]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=212</guid>
		<description><![CDATA[Implementing a merchant search using the Praized API took about 10 minutes (mainly finding the right documentation), using my WORK framework:
class PraizedMerchants(bm_api.API):
    """See: http://code.google.com/p/praized/wiki/A_Second_Tutorial_Search"""

    _uri_base = "http://api.praized.com/apitribe/merchants.xml"
    _meta_path = "community"
    _item_path = "merchants.merchant"
    _page_max_path = 'pagination.page_count'
    _page_max [...]]]></description>
			<content:encoded><![CDATA[<p>Implementing a merchant search using the <a href="http://praizedmedia.com/">Praized</a> <a href="http://praizedmedia.com/en/api">API</a> took about 10 minutes (<a href="http://code.google.com/p/praized/wiki/A_Second_Tutorial_Search">mainly finding the right documentation</a>), using my WORK framework:</p>
<pre>class PraizedMerchants(bm_api.API):
    """See: http://code.google.com/p/praized/wiki/A_Second_Tutorial_Search"""

    _uri_base = "http://api.praized.com/apitribe/merchants.xml"
    _meta_path = "community"
    _item_path = "merchants.merchant"
    _page_max_path = 'pagination.page_count'
    _page_max = -1

    def __init__(self, api_key, slug = "apitribe", **ad):
        bm_api.API.__init__(self, api_key = api_key, **ad)

        self._uri_base = "http://api.praized.com/%s/merchants.xml" % slug

    def CustomizePageURI(self, page_index):
        if page_index &gt; 1:
            return  "page=%s" % page_index</pre>
<p>Partially hardcoding &#8216;apitribe&#8217; as a &#8216;community slug&#8217; is probably a bad idea. Anyhoo, here&#8217;s how you call it&#8230;</p>
<pre>api_key = os.environ["PRAIZED_APIKEY"]
api = PraizedMerchants(api_key = api_key, slug = "david-janess-code")
api.SearchOn(
    q = "Bistro",
    l = "Toronto",
)
for item in api.IterItems():
    print json.dumps(item, indent = 1)</pre>
<p>&#8230; and a set if results, somewhat edited below. I&#8217;ll have to figure out what that &#8220;permalink&#8221; is all about (I&#8217;ve edited it to shorten it)  &#8230; it could be something neat, but I haven&#8217;t quite grasped all the ins and outs of what Praized wants to accomplish as a business.</p>
<pre>{
 "@Index": 0,
 "@Page": 1,
 "short_url": "http://przd.com/zAU-7",
 "pid": "af5bebd604f3d1517a8113e0a2e8cc58",
 "updated_at": "2008-10-04T20:49:34Z",
 "phone": "(416) 585-7896",
 "permalink":
   ".../praized/places/ca/ontario/toronto/coffee-supreme-bistro?l=Toronto&amp;q=Bistro",
 "name": "Coffee Supreme Bistro",
 "created_at": "2008-10-04T20:49:34Z",
 "location": {
  "city": {
   "name": "Toronto"
  },
  "country": {
   "code": "CA",
   "name_fr": "Canada",
   "name": "Canada"
  },
  "longitude": "-79.384071",
  "regions": {
   "province": "Ontario"
  },
  "postal_code": "M5J 1T1",
  "latitude": "43.646347",
  "street_address": "40 University Avenue"
 }
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2008/11/12/work-api-teaser-ii-praized-api/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>WORK API Teaser</title>
		<link>http://code.davidjanes.com/blog/2008/11/12/work-teaser/</link>
		<comments>http://code.davidjanes.com/blog/2008/11/12/work-teaser/#comments</comments>
		<pubDate>Wed, 12 Nov 2008 14:41:38 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[ideas]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=206</guid>
		<description><![CDATA[Following from the concepts I wrote about yesterday, here&#8217;s two examples of API parsers using a WORK model.
RSS 2.0
Class definition &#8211; that&#8217;s the whole thing there!:
class RSS20(API):
    _item_path = "channel.item"
    _meta_path = "channel"

    def __init__(self, uri):
        API.__init__(self) 

  [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://code.davidjanes.com/blog/2008/11/11/work-web-object-records/">Following from the concepts I wrote about yesterday</a>, here&#8217;s two examples of API parsers using a <a href="http://code.davidjanes.com/blog/category/work/">WORK</a> model.</p>
<h4>RSS 2.0</h4>
<p>Class definition &#8211; that&#8217;s the whole thing there!:</p>
<pre>class RSS20(API):
    _item_path = "channel.item"
    _meta_path = "channel"

    def __init__(self, uri):
        API.__init__(self) 

        self._uri_base = uri</pre>
<p>Using it:</p>
<pre>api = RSS20(uri = 'http://feeds.feedburner.com/DavidJanesCode')
for item in api.IterItems():
    print "-", item['title']</pre>
<p>Results:</p>
<pre>- WORK - Web Object Records
- Syntax Error on Line 1
- Adding MapField to inputEx
- Switching between mapping APIs and universal zoom levels
- How to dynamically load map APIs
- How to use the Google Maps API
- How to use the Microsoft Virtual Earth API
- Tip - how to get your browser’s User Agent
- How to use the MapQuest API
- How to use the Yahoo Maps Service AJAX API
- How to detect internal link jumps
- GenX - first public demonstration
- Amazon’s OpenSearch: mostly useless
- More style updates
- How to do multi-column multilingual full text searching in Oracle
- Tip - fixing broken menus over form on IE6 and IE7
- New style for this weblog
- AUMFP - Demo
- Tip - use mod_rewrite to redirect to subdirectory
- AUMFP - The Almost Universal Microformats Parser</pre>
<h4>Amazon ECS</h4>
<p>This will probably end up replacing <a href="http://code.davidjanes.com/blog/2008/10/19/pyecs-the-python-amazon-ecs-api/">PyECS</a>!</p>
<p>Class definition:</p>
<pre>class AmazonECS(API):
    _base_query = {
        "Sort" : "relevancerank",
        "Operation" : "ItemSearch",
        "Version" : "2008-08-19",
        "ResponseGroup" : [ "Small", ],
    }
    _uri_base = "http://ecs.amazonaws.com/onca/xml"
    _meta_path = "Items.Request"
    _item_path = "Items.Item"
    _page_max_path = 'Items.TotalPages'
    _item_max_path = 'Items.TotalResults'
    _page_max = -1

    def __init__(self, **ad):
        API.__init__(self, **ad)

    def CustomizePageURI(self, page_index):
        if page_index == 1:
            return

        return  "%s=%s" % ( "ItemPage", page_index )</pre>
<p>Using it:</p>
<pre>api = AmazonECS(AWSAccessKeyId = os.environ["AWS_ECS_ACCESSKEYID"])
api.SearchOn(
    Keywords = "Larry Niven",
    SearchIndex = "Books",
    Condition = "New",
)
for item in api.IterItems():
    print "-", item['ItemAttributes.Title']</pre>
<p>Results &#8230; note that this fetching many pages of results:</p>
<pre>- Fleet of Worlds
- Juggler of Worlds
- Escape from Hell
- Inferno
- N-Space
- The Ringworld Engineers (Ringworld)
- The Draco Tavern
- Legacy of Heorot: Legacy of Heorot
- Footfall
- A WORLD OUT OF TIME (ORBIT BOOKS)
- The Burning City (Hardback)
- Protector
- Burning Tower
- Three Books of Known Space
- Ringworld Throne
- Tales of Known Space: The Universe of Larry Niven
- Scatterbrain
- Ringworld
- Lucifer's Hammer
<em>... (continues) ...</em></pre>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2008/11/12/work-teaser/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WORK &#8211; Web Object Records</title>
		<link>http://code.davidjanes.com/blog/2008/11/11/work-web-object-records/</link>
		<comments>http://code.davidjanes.com/blog/2008/11/11/work-web-object-records/#comments</comments>
		<pubDate>Tue, 11 Nov 2008 16:37:42 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[ideas]]></category>
		<category><![CDATA[pybm]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=181</guid>
		<description><![CDATA[Introduction
As technologists, we&#8217;re all familiar with REST &#8211; Representational State Transfer:
Representational state transfer (REST) is a style of software architecture for distributed hypermedia systems such as the World Wide Web. As such, it is not strictly a method for building what are sometimes called &#8220;web services.&#8221; The terms “representational state transfer” and “REST” were introduced [...]]]></description>
			<content:encoded><![CDATA[<h4>Introduction</h4>
<p>As technologists, we&#8217;re all familiar with <a href="http://en.wikipedia.org/wiki/Representational_State_Transfer">REST &#8211; Representational State Transfer</a>:</p>
<p style="padding-left: 30px;"><strong>Representational state transfer</strong> (<strong>REST</strong>) is a style of <a title="Software architecture" href="http://en.wikipedia.org/wiki/Software_architecture">software architecture</a> for distributed <a title="Hypermedia" href="http://en.wikipedia.org/wiki/Hypermedia">hypermedia</a> systems such as the <a title="World Wide Web" href="http://en.wikipedia.org/wiki/World_Wide_Web">World Wide Web</a>. As such, it is not strictly a method for building what are sometimes called &#8220;<a class="mw-redirect" title="Web services" href="http://en.wikipedia.org/wiki/Web_services">web services</a>.&#8221; The terms “representational state transfer” and “REST” were introduced in <a title="2000" href="http://en.wikipedia.org/wiki/2000">2000</a> in the doctoral dissertation of <a title="Roy Fielding" href="http://en.wikipedia.org/wiki/Roy_Fielding">Roy Fielding</a>, <sup id="cite_ref-0" class="reference"><a href="http://en.wikipedia.org/wiki/Representational_State_Transfer#cite_note-0"></a></sup>one of the principal authors of the <a title="Hypertext Transfer Protocol" href="http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol">Hypertext Transfer Protocol</a> (HTTP) specification.</p>
<p>REST talks about how we address and use information on the World Wide Web. I&#8217;d like to introduce the concept of<strong> WORK -  Web Object Records</strong> &#8211; which defines how we <em>think about data</em> being transmitted across the web.</p>
<p>WORK is <em>not</em> a descriptive standard &#8211; it is not telling you what to do, it&#8217;s describing what you <em>are</em> doing. The hope is that by having a delineated description of what we are doing, we can then write tools to cut through the babel of API standards being currently promulgated by a multitude of vendors; we can standardize the unstandarded.</p>
<h4>Defintion</h4>
<p>A WORK item:</p>
<ul>
<li>is conceptually a <a href="http://www.json.org/">JSON</a>-like dictionary, consisting of string keys and object values</li>
<li>each value in the dictionary is a (usually-) shallow JSON-like object, that is:
<ul>
<li>a dictionary, list or basic value type</li>
</ul>
</li>
<li>the basic value types are Unicode strings, floating point numbers, integers and booleans</li>
<li>the difference between strings and other basic value types is fuzzy (data encoded in XML, HTML form data)</li>
<li>null/None is rarely explicitly sent, instead it is the absence of a value being defined</li>
<li>the difference between a list of objects and a single object is fuzzy and fluid (XML children)</li>
<li>the data model defined implicitly by &#8220;what you see&#8221; is as useful as formal definition elsewhere</li>
<li>there are no cycles or explicit ways of cross referencing within a WORK item</li>
<li>WORK items can &#8211; and often are &#8211; nested within another WORK item, but only one level deep</li>
</ul>
<h4>Benefits</h4>
<p>Because we technologists inherently use a WORK model of data, it explains:</p>
<ul>
<li>why we prefer XML over CSV &#8211; because we like to store more that a single atomic value in a &#8220;cell&#8221;</li>
<li>why we prefer JSON to XML &#8211; because we think about data as JSON-like WORK objects, not as nested text constructs</li>
<li>why we don&#8217;t adopt RDF (in it&#8217;s variants) for transmitting data, implementing APIs and so forth &#8211; because we don&#8217;t think in graphs</li>
<li>why we find it easier to work with web data in Python and Ruby than in Java &#8211; because those languages explicitly use the same model for <em>storing</em> data as we <em>think</em> about the data</li>
</ul>
<h4>Examples</h4>
<p>Here are a few examples of how one can view common API / feed results as WORK items.</p>
<h5>RSS feeds</h5>
<p><a href="http://cyber.law.harvard.edu/rss/rss.html">RSS</a> is defined by a two level WORK hierarchy. The first level is:</p>
<pre>{
  "channel" : CHANNEL-WORK,
  "item" : [ ITEM-WORK, ITEM-WORK, ... ]
}</pre>
<p>A ITEM-WORK looks like:</p>
<pre>{
  "title" : STRING,
  "link" : STRING,
  "description" : STRING
}</pre>
<p>If you look at at the XML for a RSS feed with only 1 ITEM, there&#8217;s no way to tell without reading the spec than ITEM repeats. This is what we mean by saying that the difference between a single object and a list is sometimes fuzzy.</p>
<h5>White Pages API</h5>
<p>The <a href="http://www.whitepages.com/landing/api">White Pages API</a> is also a two level WORK hierarchy (this pattern is very very common). Here&#8217;s the first level, slightly more complicated than RSS due to the XML serialization:</p>
<pre>{
 "meta" : META-WORK,
 "listings" : {
   "listing" : [ LISTING-WORK, LISTING-WORK, ... ]
 }
}</pre>
<p>A LISTING-WORK looks like:</p>
<pre>{
  "geodata" : OBJECT,
  "phonenumbers" : OBJECT,
  "business" : { "businessname" : "Fred's Pizza" },
  "address" : OBJECT
}</pre>
<p>The OBJECTs above in the White Pages API are somewhat complicated, but tractable (as we shall see in another post)</p>
<h5>Amazon AWS API</h5>
<p>The <a href="http://aws.amazon.com/associates/">Amazon Associates Web Service</a> allows one to retrieve information about Amazon products via XML responses. The response is a little convoluted but still recognizable:</p>
<pre>{
 "Items" : {
   "RequestHeader" : REQUEST-HEADER-WORK,
   "Item" : [ ITEM-WORK, ITEM-WORK, ... ]
 },
 "OperationRequest" : { ... }</pre>
<p>The individual ITEM-WORK describe products:</p>
<pre>{
 "ASIN" : STRING,
 "ImageSets": {
   "ImageSet": {
    "LargeImage": {
     "URL": "http://ecx.images-amazon.com/images/I/31e55zf53VL.jpg",
     "Width": "300",
     "Height": "300"
   },
  },
 "ItemAttributes": {
   "Title": "Under a Blood Red Sky - Deluxe Edition CD/DVD",
   "Manufacturer": "Island",
   "ProductGroup": "Music",
   "Artist": "U2"
 }
}</pre>
<h5>Google search result</h5>
<p>We can also look at HTML pages as if they&#8217;re returning data as WORK items. This could be explicit if rules such as microformats or RDFa were used,  or once again it could be just a convenient way of modeling the data. Here&#8217;s a hypothetical WORK item for a <a href="http://www.google.ca/search?hl=en&amp;q=bombardier&amp;btnG=Google+Search&amp;meta=">single result returned from a Google</a>:</p>
<pre>{
 "title" : "Bombardier Inc. - Bombardier - Home",
 "url " : "http://www.bombardier.com/",
 "description" : "Manufacturers of a large range of regional...",
 "links" : [
  {
   "title" : "Careers",
   "url" : "...",
  },
  {
   "title" : "Business Aircraft",
   "url" : "...",
  },
  ...
 ]
}</pre>
<h4>Conclusion</h4>
<p>WORK gives us a powerful way of looking at &#8211; at simplifying &#8211; data that&#8217;s retrieved over the Internet via REST calls. If we can view API results as being made up of standardized components &#8211; WORK items &#8211; then the amount of work we need to do to work with <em>new </em>APIs can be absolutely minimized.</p>
<p>Designing and writing some of these tools is my next task.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2008/11/11/work-web-object-records/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Amazon&#8217;s OpenSearch: mostly useless</title>
		<link>http://code.davidjanes.com/blog/2008/10/28/amazons-opensearch-mostly-useless/</link>
		<comments>http://code.davidjanes.com/blog/2008/10/28/amazons-opensearch-mostly-useless/#comments</comments>
		<pubDate>Tue, 28 Oct 2008 12:28:42 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[search]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=55</guid>
		<description><![CDATA[As part of a broader project I&#8217;m working on, I decided to see if there&#8217;s a way I could easily get search results from the web in machine readable fashion. One project to facilitate this is Amazon/A9&#8217;s OpenSearch. Alas, it&#8217;s useless:

No big web search provider has signed on to provide machine readable results. Including A9/Alexa! [...]]]></description>
			<content:encoded><![CDATA[<p>As part of a broader project I&#8217;m working on, I decided to see if there&#8217;s a way I could easily get search results from the web in machine readable fashion. One project to facilitate this is <a href="http://en.wikipedia.org/wiki/OpenSearch">Amazon/A9&#8217;s OpenSearch</a>. Alas, it&#8217;s useless:</p>
<ul>
<li>No big web search provider has signed on to provide machine readable results.<em> Including A9/Alexa!</em> <a href="http://opensearch.a9.com/-/opensearch/searches.jsp">A9 will aggregate search results from different OpenSearch providers for you</a>, it just won&#8217;t let you use Alexa&#8217;s results elsewhere (search for Alexa on that page)</li>
<li>even if you were to buy into the search aggregation approach, many (most?) of sources are dead now. A little pruning wouldn&#8217;t hurt here guys! (search for IMDB on that page)</li>
</ul>
<p>I wouldn&#8217;t be tempted to be offer my search results in OpenSearch format, because who&#8217;s going to use it after I put in the work? And if all that&#8217;s available as search sources are mostly broken C and D-list sites, well who cares? It&#8217;s a fringe benefit, but not one that I&#8217;m looking for and nor likely are you. You&#8217;d think that Amazon would use Alexa search results in OpenSearch to &#8220;prime the pump&#8221;, but I guess being the Nth placed web search service is good enough for them.</p>
<p>Note that there&#8217;s a great argument for simply marking up search results with <a href="http://microformats.org/wiki/hatom">hAtom</a> and use <code>rel=next</code> to navigate to the next page of results, but that&#8217;s a topic for another day,</p>
<p>If I have any of my facts wrong here, I apologize in advance: the documentation kind of sucks. I&#8217;m also sure there&#8217;s some difference between A9, Alexa and Amazon &#8211; I really just don&#8217;t have the time to work it out.</p>
<h4>Further reading</h4>
<ul>
<li><a href="http://a9.com/-/home.jsp">Alexa&#8217;s OpenSearch front page</a> &#8212; this is a &#8220;search aggregator&#8221;</li>
<li><a href="http://opensearch.a9.com/-/opensearch/searches.jsp">Add more OpenSearch providers to your search</a></li>
<li><a href="http://www.opensearch.org/Home">OpenSearch.org</a></li>
<li>Next up for me: <a href="http://developer.yahoo.com/search/web/">Yahoo&#8217;s Search APIs</a>, <a href="http://developer.yahoo.com/searchmonkey/">Yahoo&#8217;s SearchMonkey</a>, and <a href="http://msdn.microsoft.com/en-us/library/bb251794.aspx">Microsoft&#8217;s Live Search API</a>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2008/10/28/amazons-opensearch-mostly-useless/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More style updates</title>
		<link>http://code.davidjanes.com/blog/2008/10/28/more-style-updates/</link>
		<comments>http://code.davidjanes.com/blog/2008/10/28/more-style-updates/#comments</comments>
		<pubDate>Tue, 28 Oct 2008 10:36:01 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[administrivia]]></category>
		<category><![CDATA[html / javascript]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=49</guid>
		<description><![CDATA[I&#8217;ve added hAtom to this weblog&#8217;s template: you can see a parsed version here. I&#8217;ve also updated the comments to be prettier.
Next, to figure out what this gravatar stuff is and to expand the blogroll.
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve added <a href="http://microformats.org/wiki/hatom">hAtom</a> to this weblog&#8217;s template: you can see a parsed version <a href="http://code.davidjanes.com/aumfp/demo/?uri=http%3A%2F%2Fcode.davidjanes.com%2Fblog%2F&amp;microformat=hatom&amp;format=html">here</a>. I&#8217;ve also updated the comments to be prettier.</p>
<p>Next, to figure out what this <a href="http://en.gravatar.com/">gravatar</a> stuff is and to expand the blogroll.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2008/10/28/more-style-updates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>AUMFP &#8211; Demo</title>
		<link>http://code.davidjanes.com/blog/2008/10/25/19/</link>
		<comments>http://code.davidjanes.com/blog/2008/10/25/19/#comments</comments>
		<pubDate>Sat, 25 Oct 2008 17:13:29 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[aumfp]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=19</guid>
		<description><![CDATA[I now have the AUMFP up as a demo page. Here&#8217;s a few examples:

hAtom
hCard (with &#8220;address scrubbing&#8221;)
hCalendar

]]></description>
			<content:encoded><![CDATA[<p>I now have the <a href="http://code.davidjanes.com/blog/?p=14">AUMFP</a> up as a <strong><a href="http://code.davidjanes.com/aumfp/demo/">demo page</a></strong>. Here&#8217;s a few examples:</p>
<ul>
<li><a href="http://code.davidjanes.com/aumfp/demo/?uri=http%3A%2F%2Ftantek.com&amp;microformat=hatom&amp;format=html">hAtom</a></li>
<li><a href="http://code.davidjanes.com/aumfp/demo/?uri=http%3A%2F%2Fwwf.org.au%2Fabout%2Fcontactdetails%2F&amp;microformat=hcard&amp;format=html">hCard</a> (with &#8220;address scrubbing&#8221;)</li>
<li><a href="http://code.davidjanes.com/aumfp/demo/?uri=http%3A%2F%2Fupcoming.yahoo.com%2Fevent%2F1037077%2F&amp;microformat=hcalendar&amp;format=html">hCalendar</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2008/10/25/19/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>AUMFP &#8211; The Almost Universal Microformats Parser</title>
		<link>http://code.davidjanes.com/blog/2008/10/24/aumfp-the-almost-universal-microformats-parser/</link>
		<comments>http://code.davidjanes.com/blog/2008/10/24/aumfp-the-almost-universal-microformats-parser/#comments</comments>
		<pubDate>Fri, 24 Oct 2008 12:49:23 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[aumfp]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=14</guid>
		<description><![CDATA[I&#8217;ve completely refreshed the the Almost Universal Microformats Parser up on Google Code. Changes from the (very old) version include:

Tarballs available
Much better handling of Internationalized Characters
Many improvements to parsing
Simplified iterator interface (see below)
Spun-off support library files into their own library called PyBM. If you&#8217;re using tarballs this won&#8217;t be issued

Microformat support includes:

hCard
hCalendar
hAtom
hListing
hResume
rel-tag
xfolk

There&#8217;s also an addition [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve completely refreshed the the Almost Universal Microformats Parser up on <a href="http://code.google.com/p/aump/">Google Code</a>. Changes from the (very old) version include:</p>
<ul>
<li>Tarballs available</li>
<li>Much better handling of Internationalized Characters</li>
<li>Many improvements to parsing</li>
<li>Simplified iterator interface (see below)</li>
<li>Spun-off support library files into their own library called <a href="http://code.google.com/p/pybm/">PyBM</a>. If you&#8217;re using tarballs this won&#8217;t be issued</li>
</ul>
<p><a href="http://microformats.org/">Microformat</a> support includes:</p>
<ul>
<li>hCard</li>
<li>hCalendar</li>
<li>hAtom</li>
<li>hListing</li>
<li>hResume</li>
<li>rel-tag</li>
<li>xfolk</li>
</ul>
<p>There&#8217;s also an addition &#8216;hdocument&#8217; parser that treats an arbitrary webpage like the other parsers, returning information such as feeds, links, images and so forth.</p>
<h4>Use</h4>
<p>Using the parser is simple:</p>
<pre>import hcard
import pprint

parser = hcard.MicroformatHCard(page_uri = 'http://tantek.com')
for d in parser.Iterate():
  pprint.pprint(d)</pre>
<p>The &#8216;d&#8217; returned is an extended python &#8216;dict&#8217;. Because we capture information about classes within paths, there&#8217;s no guarantee about how a key is going to be named. For example, a phone number could be keyed &#8216;tel&#8217; or &#8216;tel.home&#8217; (or a number of other things). Our dictionary &#8216;mfdict&#8217; provides a number of functions called &#8216;find&#8217; to pull out values. For example, this will pull out the <em>least</em> dot-specified telephone number:</p>
<pre>tel = d.find('tel')</pre>
<p>We also add special keys beginning with an &#8216;@&#8217; for well known, additionally interesting or commonly used fields, to save you the trouble of figuring this information out yourself. Here&#8217;s an example parsed hCard (from the example above):</p>
<pre>{'@html': u'&lt;address id="hcard" class="vcard author"&gt;<em>…</em>&lt;/address&gt;',
 '@index': 'vcard-36',
 '@loose-uris': [u'http://tantek.com/'],
 '@parents': u'author copyright xoxo',
 '@title': u'Tantek \xc7elik',
 '@uf': 'hCard',
 '@uri': u'http://tantek.com#hcard',
 u'_url': '',
 u'adr.country-name': '',
 u'adr.locality': u'San Francisco',
 u'adr.region': u'CA',
 u'fn': u'Tantek \xc7elik',
 u'logo': u'icon-2007-128px.png',
 'n.family-name': u'\xc7elik',
 'n.given-name': u'Tantek',
 u'photo': u'http://tantek.com/icon-2007-128px.png',
 u'uid': u'Tantek \xc7elik',
 u'url': u'http://feeds.technorati.com/contact/tantek.com/%23hcard'}</pre>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2008/10/24/aumfp-the-almost-universal-microformats-parser/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Zap2it: where&#8217;s my read/write web?</title>
		<link>http://code.davidjanes.com/blog/2008/09/27/zap2it-wheres-my-readwrite-web/</link>
		<comments>http://code.davidjanes.com/blog/2008/09/27/zap2it-wheres-my-readwrite-web/#comments</comments>
		<pubDate>Sat, 27 Sep 2008 13:00:06 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[ideas]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=7</guid>
		<description><![CDATA[Zap2it is a TV listings service. I&#8217;m not a big TV guy but&#8217;s useful for me to look up a few things once and a while such as F1 racing, sailing programs, NFL football and other mindless pursuits.
Here&#8217;s a listing for a BBC show tomorrow on Paul Cayard. I have little confidence that this link [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.zap2it.com/">Zap2it</a> is a TV listings service. I&#8217;m not a big TV guy but&#8217;s useful for me to look up a few things once and a while such as F1 racing, sailing programs, NFL football and other mindless pursuits.</p>
<p><a href="http://tvlistings.zap2it.com/tvlistings/ZCProgram.do?method=getDetail&amp;pgmId=EP007284760031&amp;sch=1222597800000&amp;stn=16493&amp;chn=37">Here&#8217;s a listing</a> for a BBC show tomorrow on <a href="http://en.wikipedia.org/wiki/Paul_Cayard">Paul Cayard</a>. I have little confidence that this link will still work in a week&#8217;s time, let alone a year, but let&#8217;s leave that alone for now. Zap2it provides a way of saving it to &#8220;My Favorites&#8221; but I&#8217;m not really interested in signing up just yet.</p>
<ul>
<li>why can&#8217;t I access this as an <a href="http://en.wikipedia.org/wiki/ICalendar">iCalendar</a> (or a <a href="http://microformats.org/wiki/hcalendar">hCalendar</a>!) so I can add this to my <a href="http://www.google.com/calendar">Google Calendar</a> or my Apple <a href="http://en.wikipedia.org/wiki/ICal">iCal</a> program? I can see why Zap2it would like to retain customers as accounts for monetization purposes, but I&#8217;m more likely to remain a loyal Zap2it user if it provides functionality I need. Otherwise, some day someone else will do it for me</li>
<li>Once I&#8217;ve made an account, why can&#8217;t I access all my favorites as an iCal object or even export it directly into Google Calendar? Because then I could share that calendar with my friends (i.e. the gang I get together with on the weekend to watch football) bring more people to Zap2it!</li>
</ul>
<p>Another rant on the account creation process:</p>
<ul>
<li>Zap2it doesn&#8217;t allow you to register <em>valid</em> email addresses in the format <em>myname+zap2it@example.com</em>! Come on guys, get it together. This is especially important because:</li>
<li>Zap2it makes you check a box to opt out of partner spam; if you&#8217;re not a form reader, you could find yourself getting useless information that you had no desire to receive in the first place. Sigh.</li>
<li>After I create my account, I&#8217;m not logged in. I then have to re-find the show I was trying to &#8220;Save to Favorites&#8221;; try to bookmark that; be forced to login; be brought to my account page; re-find the show again; and then bookmark. High comedy!</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2008/09/27/zap2it-wheres-my-readwrite-web/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
