David Janes' Code Weblog

March 2, 2009

AUAPI: encoding hCards in JSON

auapi, aumfp, semantic web · David Janes · 9:15 am ·

The best model for describing people is the vCard standard, RFC 2425 and RFC 2426. The microformats community has adapted the vCard standard for serialization into HTML using hCard. In the Almost Universal API (AUAPI), people and organizations should almost always be described using a JSON-encoded hCard.

It is difficult to describe, without going into great minutiae, what the difficulties are in transforming the hCard and vCard standards into a pleasant looking and more importantly an easy-to-use hierarchy: there are certainly a number of edge cases that one would have to deal with it! There’s certainly an argument for just encoding hCard/vCards as a straight vCard serialization – at least in terms of simplicity of encoding. The issue is that the end consumer (which I believe should be the strongest focus) really has to do the dirty work in grouping everything together themselves.

Algorithm

This algorithm is destructive to the data structure it works upon, so generally you’ll be make a copy first.

  • note that though we reference to all upper, mixed case, camel case and so forth hCard attributes, all attributes are actually physically encoded in lower case with “-” separators
  • let the “groupers” be ADR, GEO, N, ORG, TEL. Groupers group together attributes that are related (such as FirstName and LastName)
  • let the “narrowers” be Home, Work, Parcel, Postal (and no-narrower). Narrowers assign a specific meaning to a value, i.e. this a Work phone number.
  • assume each value is described by a number of attributes, i.e. “416-515-5555″ can be described by ( TEL, Work, Mobile )

Then:

  • for Narrower, then for each Grouper
    • create a dictionary ’subd’
    • for each values that is described by the ( Narrower, Grouper )
      • for each remaining attribute (besides Narrower and Grouper), add to subd
      • if the value was fully described by ( Narrower, Grouper ), add to subd under the key ‘@’
    • for key, value in subd
      • add to the final result
      • if narrower is not ‘no-narrower’, add ‘@narrower = narrower’
    • add subd to the result under the key Grouper
  • add all remaining values from the original hCard to the result, noting that
    • if the value is described by a Narrower, we encoded it as a dictionary with ‘@narrower = narrower’

Clear? Well, the examples below will help. We the “416-515-5555″ above we would get:

{
 "hcard:hcard" : {
  'tel' : {
   '@work' : 'work',
   'mobile' : '416-515-5555',
  }
 }
}

Code

The source code for this algorithm is in the AUMFP tree, in file vcard.py function decompose (see around line 1083)

Namespace

All JSON encoded hCards are in the namespace hcard:. In the AUAPI serialization, this namespace should only be on the enclosing element, all children will be assumed to be in the namespace. I am currently using the URI http://purl.org/uF/hCard/1.0/ for this namespace (when XML serializing); this may change in the future.

Example 1 – home phone number from whitepages.com

{
 'hcard:hcard': {'adr': {'country-name': u'United States',
                         'locality': u'Huntsville',
                         'postal-code': '35801-2908',
                         'region': 'Alabama',
                         'street-address': u'1114 Humes Avenue NE'},
                 'fn': u'Jack Smith',
                 'geo': {'latitude': 34.743763000000001,
                         'longitude': -86.572568000000004},
                 'n': {'family-name': u'Smith', 'given-name': u'Jack'},
                 'tel': {'voice': u'256-539-8788'}},
}

Example 2 – work phone number from whitepages.com

{ 'hcard:hcard': {'adr': {'country-name': u'United States',
                         'locality': u'Gurley',
                         'postal-code': '35748-8715',
                         'region': 'Alabama',
                         'street-address': u'148 Little Cove Road'},
                 'fn': u'Jack Smith',
                 'geo': {'latitude': 34.698258000000003,
                         'longitude': -86.383027999999996},
                 'n': {'family-name': u'Smith', 'given-name': u'Jack'},
                 'org': {'organization-name': u'Alldyne Powder Technoliges'},
                 'tel': {'@work': 'work', 'voice': u'256-776-1238'}},
}

Example 3 – hCard directly to JSON

{ 'hcard:hcard': {
                 'adr': {u'country-name': u'United States of America',
                         u'locality': u'San Francisco',
                         u'region': u'CA'},
                 u'fn': u'Tantek \xc7elik',
                 u'logo': u'icon-2007-128px.png',
                 'n': {'family-name': u'\xc7elik',
                       'given-name': u'Tantek'},
                 u'photo': u'http://tantek.com/icon-2007-128px.png',
                 u'url': u'http://feeds.technorati.com/contact/tantek.com/#hcard'},
}

October 25, 2008

AUMFP – Demo

aumfp, demo, python, semantic web · David Janes · 1:13 pm ·

I now have the AUMFP up as a demo page. Here’s a few examples:

October 24, 2008

AUMFP – The Almost Universal Microformats Parser

aumfp, python, semantic web · David Janes · 8:49 am ·

I’ve completely refreshed the the Almost Universal Microformats Parser up on Google Code. Changes from the (very old) version include:

  • Tarballs available
  • Much better handling of Internationalized Characters
  • Many improvements to parsing
  • Simplified iterator interface (see below)
  • Spun-off support library files into their own library called PyBM. If you’re using tarballs this won’t be issued

Microformat support includes:

  • hCard
  • hCalendar
  • hAtom
  • hListing
  • hResume
  • rel-tag
  • xfolk

There’s also an addition ‘hdocument’ parser that treats an arbitrary webpage like the other parsers, returning information such as feeds, links, images and so forth.

Use

Using the parser is simple:

import hcard
import pprint

parser = hcard.MicroformatHCard(page_uri = 'http://tantek.com')
for d in parser.Iterate():
  pprint.pprint(d)

The ‘d’ returned is an extended python ‘dict’. Because we capture information about classes within paths, there’s no guarantee about how a key is going to be named. For example, a phone number could be keyed ‘tel’ or ‘tel.home’ (or a number of other things). Our dictionary ‘mfdict’ provides a number of functions called ‘find’ to pull out values. For example, this will pull out the least dot-specified telephone number:

tel = d.find('tel')

We also add special keys beginning with an ‘@’ for well known, additionally interesting or commonly used fields, to save you the trouble of figuring this information out yourself. Here’s an example parsed hCard (from the example above):

{'@html': u'<address id="hcard" class="vcard author"></address>',
 '@index': 'vcard-36',
 '@loose-uris': [u'http://tantek.com/'],
 '@parents': u'author copyright xoxo',
 '@title': u'Tantek \xc7elik',
 '@uf': 'hCard',
 '@uri': u'http://tantek.com#hcard',
 u'_url': '',
 u'adr.country-name': '',
 u'adr.locality': u'San Francisco',
 u'adr.region': u'CA',
 u'fn': u'Tantek \xc7elik',
 u'logo': u'icon-2007-128px.png',
 'n.family-name': u'\xc7elik',
 'n.given-name': u'Tantek',
 u'photo': u'http://tantek.com/icon-2007-128px.png',
 u'uid': u'Tantek \xc7elik',
 u'url': u'http://feeds.technorati.com/contact/tantek.com/%23hcard'}

Powered by WordPress