<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>David Janes&#039; Code Weblog &#187; aumfp</title>
	<atom:link href="http://code.davidjanes.com/blog/category/aumfp/feed/" rel="self" type="application/rss+xml" />
	<link>http://code.davidjanes.com/blog</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Sun, 11 Apr 2010 12:32:10 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>AUAPI: encoding hCards in JSON</title>
		<link>http://code.davidjanes.com/blog/2009/03/02/auapi-encoding-hcards-in-json/</link>
		<comments>http://code.davidjanes.com/blog/2009/03/02/auapi-encoding-hcards-in-json/#comments</comments>
		<pubDate>Mon, 02 Mar 2009 14:15:38 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[auapi]]></category>
		<category><![CDATA[aumfp]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=509</guid>
		<description><![CDATA[The best model for describing people is the vCard standard, RFC 2425 and RFC 2426. The microformats community has adapted the vCard standard for serialization into HTML using hCard. In the Almost Universal API (AUAPI), people and organizations should almost always be described using a JSON-encoded hCard.
It is difficult to describe, without going into great [...]]]></description>
			<content:encoded><![CDATA[<p>The best model for describing people is the <a href="http://en.wikipedia.org/wiki/VCard">vCard</a> standard, <a href="http://tools.ietf.org/html/rfc2425">RFC 2425</a> and <a href="http://tools.ietf.org/html/rfc2426">RFC 2426</a>. The microformats community has adapted the vCard standard for serialization into HTML using <a href="http://microformats.org/wiki/hcard">hCard</a>. In the <a href="http://code.davidjanes.com/blog/2009/02/27/introducing-the-almost-universal-api/">Almost Universal API</a> (AUAPI), people and organizations should almost always be described using a JSON-encoded hCard.</p>
<p>It is difficult to describe, without going into great minutiae, what the difficulties are in transforming the hCard and vCard standards into a pleasant looking and more importantly an easy-to-use hierarchy: there are certainly a number of edge cases that one would have to deal with it! There&#8217;s certainly an argument for just encoding hCard/vCards as a straight vCard serialization &#8211; at least in terms of simplicity of encoding. The issue is that the end consumer (which I believe should be the strongest focus) really has to do the dirty work in grouping everything together themselves.</p>
<h4>Algorithm</h4>
<p>This algorithm is destructive to the data structure it works upon, so generally you&#8217;ll be make a copy first.</p>
<ul>
<li>note that though we reference to all upper, mixed case, camel case and so forth hCard attributes, all attributes are actually physically encoded in lower case with &#8220;-&#8221; separators</li>
<li>let the &#8220;groupers&#8221; be ADR, GEO, N, ORG, TEL. Groupers group together attributes that are related (such as FirstName and LastName)</li>
<li>let the &#8220;narrowers&#8221; be Home, Work, Parcel, Postal (and <em>no-narrower</em>). Narrowers assign a specific meaning to a value, i.e. this a <em>Work</em> phone number.</li>
<li>assume each value is described by a number of attributes, i.e. &#8220;416-515-5555&#8243; can be described by ( TEL, Work, Mobile )</li>
</ul>
<p>Then:</p>
<ul>
<li>for Narrower, then for each Grouper
<ul>
<li>create a dictionary &#8217;subd&#8217;</li>
<li>for each values that is described by the ( Narrower, Grouper )
<ul>
<li>for each remaining attribute (besides Narrower and Grouper), add to subd</li>
<li>if the value was fully described by ( Narrower, Grouper ), add to subd under the key &#8216;@&#8217;</li>
</ul>
</li>
<li>for key, value in subd
<ul>
<li>add to the final result</li>
<li>if narrower is not &#8216;no-narrower&#8217;, add &#8216;@narrower = narrower&#8217;</li>
</ul>
</li>
</ul>
<ul>
<li>add subd to the result under the key Grouper</li>
</ul>
</li>
<li>add all remaining values from the original hCard to the result, noting that
<ul>
<li>if the value is described by a Narrower, we encoded it as a dictionary with &#8216;@narrower = narrower&#8217;</li>
</ul>
</li>
</ul>
<p>Clear? Well, the examples below will help. We the &#8220;416-515-5555&#8243; above we would get:</p>
<pre>{
 "hcard:hcard" : {
  'tel' : {
   '@work' : 'work',
   'mobile' : '416-515-5555',
  }
 }
}</pre>
<h4>Code</h4>
<p>The source code for this algorithm is in the <a href="http://code.google.com/p/aump/">AUMFP</a> tree, in file <code>vcard.py</code> function <code>decompose</code> (<a href="http://code.google.com/p/aump/source/browse/trunk/vcard.py">see around line 1083</a>)</p>
<h4>Namespace</h4>
<p>All JSON encoded hCards are in the namespace <code>hcard:</code>. In the AUAPI serialization, this namespace should only be on the enclosing element, all children will be assumed to be in the namespace. I am currently using the URI <code>http://purl.org/uF/hCard/1.0/</code> for this namespace (<a href="http://code.davidjanes.com/blog/2009/03/01/auapi-json-to-xml-serialization/">when XML serializing</a>); this may change in the future.</p>
<h4>Example 1 &#8211; home phone number from whitepages.com</h4>
<pre>{
 'hcard:hcard': {'adr': {'country-name': u'United States',
                         'locality': u'Huntsville',
                         'postal-code': '35801-2908',
                         'region': 'Alabama',
                         'street-address': u'1114 Humes Avenue NE'},
                 'fn': u'Jack Smith',
                 'geo': {'latitude': 34.743763000000001,
                         'longitude': -86.572568000000004},
                 'n': {'family-name': u'Smith', 'given-name': u'Jack'},
                 'tel': {'voice': u'256-539-8788'}},
}</pre>
<h4>Example 2 &#8211; work phone number from whitepages.com</h4>
<pre>{ 'hcard:hcard': {'adr': {'country-name': u'United States',
                         'locality': u'Gurley',
                         'postal-code': '35748-8715',
                         'region': 'Alabama',
                         'street-address': u'148 Little Cove Road'},
                 'fn': u'Jack Smith',
                 'geo': {'latitude': 34.698258000000003,
                         'longitude': -86.383027999999996},
                 'n': {'family-name': u'Smith', 'given-name': u'Jack'},
                 'org': {'organization-name': u'Alldyne Powder Technoliges'},
                 'tel': {'@work': 'work', 'voice': u'256-776-1238'}},
}</pre>
<h4>Example 3 &#8211; hCard directly to JSON</h4>
<pre>{ 'hcard:hcard': {
                 'adr': {u'country-name': u'United States of America',
                         u'locality': u'San Francisco',
                         u'region': u'CA'},
                 u'fn': u'Tantek \xc7elik',
                 u'logo': u'icon-2007-128px.png',
                 'n': {'family-name': u'\xc7elik',
                       'given-name': u'Tantek'},
                 u'photo': u'http://tantek.com/icon-2007-128px.png',
                 u'url': u'http://feeds.technorati.com/contact/tantek.com/#hcard'},
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2009/03/02/auapi-encoding-hcards-in-json/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>AUMFP &#8211; Demo</title>
		<link>http://code.davidjanes.com/blog/2008/10/25/19/</link>
		<comments>http://code.davidjanes.com/blog/2008/10/25/19/#comments</comments>
		<pubDate>Sat, 25 Oct 2008 17:13:29 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[aumfp]]></category>
		<category><![CDATA[demo]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=19</guid>
		<description><![CDATA[I now have the AUMFP up as a demo page. Here&#8217;s a few examples:

hAtom
hCard (with &#8220;address scrubbing&#8221;)
hCalendar

]]></description>
			<content:encoded><![CDATA[<p>I now have the <a href="http://code.davidjanes.com/blog/?p=14">AUMFP</a> up as a <strong><a href="http://code.davidjanes.com/aumfp/demo/">demo page</a></strong>. Here&#8217;s a few examples:</p>
<ul>
<li><a href="http://code.davidjanes.com/aumfp/demo/?uri=http%3A%2F%2Ftantek.com&amp;microformat=hatom&amp;format=html">hAtom</a></li>
<li><a href="http://code.davidjanes.com/aumfp/demo/?uri=http%3A%2F%2Fwwf.org.au%2Fabout%2Fcontactdetails%2F&amp;microformat=hcard&amp;format=html">hCard</a> (with &#8220;address scrubbing&#8221;)</li>
<li><a href="http://code.davidjanes.com/aumfp/demo/?uri=http%3A%2F%2Fupcoming.yahoo.com%2Fevent%2F1037077%2F&amp;microformat=hcalendar&amp;format=html">hCalendar</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2008/10/25/19/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>AUMFP &#8211; The Almost Universal Microformats Parser</title>
		<link>http://code.davidjanes.com/blog/2008/10/24/aumfp-the-almost-universal-microformats-parser/</link>
		<comments>http://code.davidjanes.com/blog/2008/10/24/aumfp-the-almost-universal-microformats-parser/#comments</comments>
		<pubDate>Fri, 24 Oct 2008 12:49:23 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[aumfp]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=14</guid>
		<description><![CDATA[I&#8217;ve completely refreshed the the Almost Universal Microformats Parser up on Google Code. Changes from the (very old) version include:

Tarballs available
Much better handling of Internationalized Characters
Many improvements to parsing
Simplified iterator interface (see below)
Spun-off support library files into their own library called PyBM. If you&#8217;re using tarballs this won&#8217;t be issued

Microformat support includes:

hCard
hCalendar
hAtom
hListing
hResume
rel-tag
xfolk

There&#8217;s also an addition [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve completely refreshed the the Almost Universal Microformats Parser up on <a href="http://code.google.com/p/aump/">Google Code</a>. Changes from the (very old) version include:</p>
<ul>
<li>Tarballs available</li>
<li>Much better handling of Internationalized Characters</li>
<li>Many improvements to parsing</li>
<li>Simplified iterator interface (see below)</li>
<li>Spun-off support library files into their own library called <a href="http://code.google.com/p/pybm/">PyBM</a>. If you&#8217;re using tarballs this won&#8217;t be issued</li>
</ul>
<p><a href="http://microformats.org/">Microformat</a> support includes:</p>
<ul>
<li>hCard</li>
<li>hCalendar</li>
<li>hAtom</li>
<li>hListing</li>
<li>hResume</li>
<li>rel-tag</li>
<li>xfolk</li>
</ul>
<p>There&#8217;s also an addition &#8216;hdocument&#8217; parser that treats an arbitrary webpage like the other parsers, returning information such as feeds, links, images and so forth.</p>
<h4>Use</h4>
<p>Using the parser is simple:</p>
<pre>import hcard
import pprint

parser = hcard.MicroformatHCard(page_uri = 'http://tantek.com')
for d in parser.Iterate():
  pprint.pprint(d)</pre>
<p>The &#8216;d&#8217; returned is an extended python &#8216;dict&#8217;. Because we capture information about classes within paths, there&#8217;s no guarantee about how a key is going to be named. For example, a phone number could be keyed &#8216;tel&#8217; or &#8216;tel.home&#8217; (or a number of other things). Our dictionary &#8216;mfdict&#8217; provides a number of functions called &#8216;find&#8217; to pull out values. For example, this will pull out the <em>least</em> dot-specified telephone number:</p>
<pre>tel = d.find('tel')</pre>
<p>We also add special keys beginning with an &#8216;@&#8217; for well known, additionally interesting or commonly used fields, to save you the trouble of figuring this information out yourself. Here&#8217;s an example parsed hCard (from the example above):</p>
<pre>{'@html': u'&lt;address id="hcard" class="vcard author"&gt;<em>…</em>&lt;/address&gt;',
 '@index': 'vcard-36',
 '@loose-uris': [u'http://tantek.com/'],
 '@parents': u'author copyright xoxo',
 '@title': u'Tantek \xc7elik',
 '@uf': 'hCard',
 '@uri': u'http://tantek.com#hcard',
 u'_url': '',
 u'adr.country-name': '',
 u'adr.locality': u'San Francisco',
 u'adr.region': u'CA',
 u'fn': u'Tantek \xc7elik',
 u'logo': u'icon-2007-128px.png',
 'n.family-name': u'\xc7elik',
 'n.given-name': u'Tantek',
 u'photo': u'http://tantek.com/icon-2007-128px.png',
 u'uid': u'Tantek \xc7elik',
 u'url': u'http://feeds.technorati.com/contact/tantek.com/%23hcard'}</pre>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2008/10/24/aumfp-the-almost-universal-microformats-parser/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
