David Janes' Code Weblog

October 31, 2008

How to detect internal link jumps

html / javascript,tips · David Janes · 5:48 pm ·

If you play with our post GenX – first public demonstration, you’ll see that the an internal link marker “External“. Clicking on any of these links will bring you a highlighted section of the document, such as this. You’ll also see this on Wikipedia, for example here.

The way to do this is quite simple with CSS3 using the ‘:target’ pseudo-class:

div.gencard:target {
    background-color: rgb(228, 241, 228);
}

But let’s say you don’t have access to CSS3 or you need to highlight a parent element (this is actually how I started this investigation). CSS doesn’t provide a way to select a parent element that has a child with a certain set of properties.

Given that we can’t use CSS, the next tool we look to is JS. One’s initial take for this would be quite simple:

  • create a document load event
  • in that event, look at window.location.hash
  • if substr(1) refers to an existing element, highlight whatever it is you need

But what happens if you click on an “internal link”, e.g. one like <a href=”#id_internal”>? Well, after a little research it turns out you’re out of luck – no DOM provides such an event. Alas, it turns out – as far as I’ve been able to see – the solution is write a little JS timer than monitors the window.location for changes, and if it has, update your highlight appropriately.

Further reading:

October 30, 2008

GenX – first public demonstration

genx,ideas,pyecs · David Janes · 2:37 pm ·

This is a post using my next project, called GenX which I’ll be describing in the next few weeks. Here’s a some information about a band called U2. I just bought a reissued CD/DVD combo of Under a Blood Red Sky, so it seems appropriate. (Note: post updated)

poster

U2

U2 are a rock band from Dublin, Ireland. The band consists of Paul David Hewson (vocals and guitar), The Edge (guitar, keyboards, and vocals), Adam Clayton (bass guitar) and Larry Mullen, Jr. (drums and percussion)….

poster

Paul David Hewson

Paul David Hewson (born 10 May 1960 in Glasnevin, Dublin, Ireland), also known by his stage name Bono, is the main vocalist of the Irish rock band U2. Bono was born and raised in Dublin, Ireland, and attended…

poster

The Edge

David Howell Evans (born 8 August 1961 in Barking, East London, UK), more widely known by his moniker The Edge, is a musician known best as the guitarist, keyboardist, and main backing vocalist for the…

poster

Adam Clayton

Adam Charles Clayton (born 13 March 1960 in Chinnor, Oxfordshire, UK), is the bassist of the rock band U2. A British citizen, Clayton has resided in County Dublin since the time his family moved to Malahide…

poster

Larry Mullen, Jr.

Lawrence Joseph “Larry” Mullen, Jr. (born 31 October 1961 in Artane, Dublin, Ireland) is the drummer for the Irish rock band U2. He is the founder of U2, which was originally known as “The Larry Mullen…

poster

Under a Blood Red Sky – Deluxe Edition CD/DVDU2 [2008-07-10]

A Deluxe Edition version featuring the Under a Blood Red Sky CD (Disc 1) and the Live at Red Rocks DVD (Disc 2). The remastered Under a Blood Red Sky album was originally released in November 1983, and consists of live recordings from three shows on the band’s War Tour through Europe and America. Recorded at the Red Rocks Amphitheatre in Colorado on 5th June 1983, Live at Red Rocks will be available for the first time on DVD, and will include 5 previously unreleased songs, a director’s commentary, digitally re-graded pictures and a 5.1 mix. This is the fourth CD as part of the Amazon.com-exclusive U2 deluxe edition box set and fits into the open slot within the packaging.

October 28, 2008

Amazon's OpenSearch: mostly useless

search,semantic web · David Janes · 8:28 am ·

As part of a broader project I’m working on, I decided to see if there’s a way I could easily get search results from the web in machine readable fashion. One project to facilitate this is Amazon/A9′s OpenSearch. Alas, it’s useless:

  • No big web search provider has signed on to provide machine readable results. Including A9/Alexa! A9 will aggregate search results from different OpenSearch providers for you, it just won’t let you use Alexa’s results elsewhere (search for Alexa on that page)
  • even if you were to buy into the search aggregation approach, many (most?) of sources are dead now. A little pruning wouldn’t hurt here guys! (search for IMDB on that page)

I wouldn’t be tempted to be offer my search results in OpenSearch format, because who’s going to use it after I put in the work? And if all that’s available as search sources are mostly broken C and D-list sites, well who cares? It’s a fringe benefit, but not one that I’m looking for and nor likely are you. You’d think that Amazon would use Alexa search results in OpenSearch to “prime the pump”, but I guess being the Nth placed web search service is good enough for them.

Note that there’s a great argument for simply marking up search results with hAtom and use rel=next to navigate to the next page of results, but that’s a topic for another day,

If I have any of my facts wrong here, I apologize in advance: the documentation kind of sucks. I’m also sure there’s some difference between A9, Alexa and Amazon – I really just don’t have the time to work it out.

Further reading

More style updates

administrivia,html / javascript,semantic web · David Janes · 6:36 am ·

I’ve added hAtom to this weblog’s template: you can see a parsed version here. I’ve also updated the comments to be prettier.

Next, to figure out what this gravatar stuff is and to expand the blogroll.

October 27, 2008

How to do multi-column multilingual full text searching in Oracle

db · · David Janes · 7:43 pm ·

Here’s how you do full text searching across multiple different columns in a multilingual environment on Oracle 9 and better, quick and easy. I’m more of an MySQL guy, so you’ll have to excuse me if my Oracle lingo isn’t up to scratch.

One time setup, as system

This has to be execute by a privileged Oracle user. If there is an error message, it’s probably because CTX_DDL has not been installed and you’ll need to talk to a skilled DBA or figure out how to do it yourself.

GRANT EXECUTE ON CTX_DDL TO mydb;

Lexer setup

One the step above is done, you can do everything as the Oracle user ID you normally work under.

This step sets up the ‘global_lexer’ which determine how Oracle understands text (e.g. that oxen is the plural of ox in the English locale). If you are only working in English I believe you can just drop all the French references.

begin
ctx_ddl.drop_preference('global_lexer');
end;
/

begin
ctx_ddl.drop_preference('english_lexer');
end;
/

begin
ctx_ddl.drop_preference('french_lexer');
end;
/

begin
ctx_ddl.create_preference('english_lexer','basic_lexer');
ctx_ddl.set_attribute('english_lexer','index_themes','yes');
ctx_ddl.set_attribute('english_lexer','printjoins','$_#@*&^%/\()');
ctx_ddl.set_attribute('english_lexer', 'skipjoins', '-');
ctx_ddl.create_preference('french_lexer','basic_lexer');
ctx_ddl.set_attribute('french_lexer','index_themes','no');
ctx_ddl.set_attribute('french_lexer','base_letter','yes');
end;
/

exec ctx_ddl.create_preference('global_lexer','multi_lexer') ;

begin
ctx_ddl.add_sub_lexer('global_lexer', 'french', 'french_lexer');
ctx_ddl.add_sub_lexer('global_lexer', 'default','english_lexer');
end;
/

Table setup

If you’re doing a multilingual set up, you’ll need a field in your table that specifies the language. One can never be quite sure about how things are done in Oracle, but the values EN and FR seem to do the right thing. Our table looks something like this (we’re just showing the important stuff here):

CREATE TABLE Post
(
    search    CLOB,
    subject   NVARCHAR2(256)  NOT NULL,
    body      CLOB,
    lang      VARCHAR2(6)    NOT NULL
);

The subject and body fields have the data we want to search. Because Oracle doesn’t let you search multiple fields, we concatenate them into search at UPDATE/INSERT time. lang is storing the language code for this particular row, i.e. EN or FR.

Next we must set up a trigger that will maintain the search field for you:

CREATE OR REPLACE TRIGGER post_text_trigger
BEFORE UPDATE OR INSERT ON post
FOR EACH ROW
DECLARE
    a CLOB;
BEGIN
    a := :NEW.subject || ' ';
    a := a || :NEW.body;
    :NEW.search := a;
END post_text_trigger ;
/

Now you may be asking yourself: why is that ‘a’ assignment all over the place? Simple: who the hell knows, it’s Oracle.

After that we create a new INDEX on the table to do the full text searching:

CREATE INDEX post_text ON Post(search)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('LEXER global_lexer STOPLIST ctxsys.default_stoplist LANGUAGE COLUMN lang');

Maintaining the Index

Unlike what you might expect, Oracle doesn’t magically keep the full text search index up-to-date. This is something you have to maintain on your own. If you were to execute a search, you wouldn’t find anything right now.

To bring the index up to date, use the following command:

EXEC CTX_DDL.SYNC_INDEX('post_text');

In our implementation we just run that command from a cron script every 30 minutes. There’s also a way to do this in Oracle I understand, but we’re quite comfortable with UNIX commands. How often you want to run will vary depending on how much text you have and how often you update it.

You also have another option: create a trigger that will update the index every time a row is modified. There may be performance issues involved with this, but if you want to try it, here’s the magic:

CREATE OR REPLACE TRIGGER post_text_trigger2
AFTER INSERT OR UPDATE OR DELETE ON post
DECLARE
    v_job NUMBER;
BEGIN
    IF deleting THEN
        DBMS_JOB.SUBMIT(v_job, 'ctx_ddl.optimize_index(''post_text'',''FULL'');', SYSDATE);
    ELSE
        DBMS_JOB.SUBMIT(v_job, 'ctx_ddl.sync_index(''post_text'');', SYSDATE);
    END IF;
END;
/

Doing searches

Oracle provides all sorts of various methods, obtusely documented. See the references below for more. What you probably want to do is, well, look for stuff. Here’s what we did

First, we convert the search string into a safe list words — no punctuation, etc.. The we create a search string that looks something like the following:

SELECT * FROM Post WHERE contains(search, '${mutt} AND ${and} AND ${jeff}') > 0;

Note the {}: this stops the word ‘and’ being searched from from being recognized as a Oracle Keyword

Further reading

There’s lots of info here and the first reference in particular told me most of the information I needed to know.

October 26, 2008

Tip – fixing broken menus over form on IE6 and IE7

html / javascript · David Janes · 7:17 pm ·

If you use pop up menus on your site, you may find that they don’t work very well on IE6 and sometimes on IE7 also. In particular, FORM fields (INPUT and SELECT) show above the menus and sometimes the text of BUTTONs appear on top of the menu.

This post describes how to fix this. The technique described as the same as this one here, but you may find this a little easier to implement. Honestly, your best solution is not to code menus yourself but find a menus package that handles this for you. However, you may find yourself in a situation like I where there is no choice: you have to retrofit existing code.

This is how you do it:

Add the following to your CSS – this styles an IFRAME that we’re going to drop into a menu. Initially I though the z-index was not needed, but it turns out you want to make sure the IFRAME is below the menu because otherwise the IFRAME will capture mouse and keyboard events – i.e. your menu will not work! It doesn’t seem to matter what the z-index of the IFRAME relative to the background form is though.

<style type="text/css">
.menu_iframe {
    position: absolute;
    top: 0px;
    left: 0px;
    width: 0px;
    height: 0px;
    filter: alpha(opacity = 0);
    z-index: -1;

}
<style>

Then add the following to the element that is your menu popup, at the very top:

<!--[if lte IE 7]><iframe class="menu_iframe"></iframe><![endif]-->

This will add on IE6/7 only (due to the magic Internet Explorer conditions) an IFRAME to the menu popup. Other browers will not add this IFRAME and will continue to work as per normal. Update: see the note at the bottom if you are doing this on a HTTPS secure page.

Add this code to your JavaScript:

ie_done = {}

function miframe(e_menu) {
    if (!e_menu) return;
    if (ie_done[e_menu.id]) return;
    ie_done[e_menu.id] = 1;

    var e_iframes = e_menu.getElementsByTagName("iframe");
    if (e_iframes.length == 0) return;

    e_iframes[0].style.width = e_menu.offsetWidth;
    e_iframes[0].style.height = e_menu.offsetHeight;
}

This function will – the first time it is called per-menu – resize the first IFRAME found in the menu to be the same size of the menu. Note the dependency on having an ID tag on the menu popup!

When you’re popping up your menu, call miframe with the menu element and this will make the IFRAME the same size of the menu. The IFRAME magically blocks out the form but also allows the menu to show through (play with the opacity parameter for fun: opaque is 100).

But it still doesn’t work!?

This is where it gets, well, weirder. It turns out that if you have anything marked up as position: relative, well, all the above doesn’t work. So make sure you get rid of all of those and find a different way to do fine tuning.

BUTTON tags never seem to work either – the text often filters through onto the menu. Alas, they’ve gone into the HTML junk heap for me for now until IE6 gets obsoleted. Try using a style A tag instead and suck up the ugliness.

Update: HTTPS pages

If you’re using HTTPS, the method above will have to be adjusted because you’ll start getting a “The page contains secure and non-secure items” message. My solution was to add a src="/empty.htm" tag but it occurs to me now that using src="about:blank" may work also but I haven’t tested it.

October 25, 2008

New style for this weblog

administrivia,html / javascript · David Janes · 2:56 pm ·

I’ve put together a basic WordPress theme for this site, to make it similar to everything else. More work will be forthcoming, including add hAtom to all appropriate places.

AUMFP – Demo

aumfp,demo,python,semantic web · David Janes · 1:13 pm ·

I now have the AUMFP up as a demo page. Here’s a few examples:

October 24, 2008

Tip – use mod_rewrite to redirect to subdirectory

apache,tips · David Janes · 11:50 am ·

I’ve organized a part of http://code.davidjanes.com/ as follows:

  • / – the main page
  • /blog/ – the blog
  • /genx/demo/ – the demo for GenX

However, I don’t have a /genx/ page (yet) and I plan to do a few more projects with this type of hierarchy. So what to do? Enter Apache mod_rewrite.

In the /genx/ directory I added a .htaccess file with the follow content:

RewriteEngine on
RewriteBase /
RewriteRule ^$ http://code.davidjanes.com/genx/demo/

Note that this may not work in all hosted environments, because they may not allow per-directory access to .htaccess. In that case, I’d consider adding a “index.html” file to the directory with a meta refresh, as follows:

<html>
<head>
<meta http-equiv="refresh" content="0;url=http://code.davidjanes.com/genx/demo/">
</head>
</html>

AUMFP – The Almost Universal Microformats Parser

aumfp,python,semantic web · David Janes · 8:49 am ·

I’ve completely refreshed the the Almost Universal Microformats Parser up on Google Code. Changes from the (very old) version include:

  • Tarballs available
  • Much better handling of Internationalized Characters
  • Many improvements to parsing
  • Simplified iterator interface (see below)
  • Spun-off support library files into their own library called PyBM. If you’re using tarballs this won’t be issued

Microformat support includes:

  • hCard
  • hCalendar
  • hAtom
  • hListing
  • hResume
  • rel-tag
  • xfolk

There’s also an addition ‘hdocument’ parser that treats an arbitrary webpage like the other parsers, returning information such as feeds, links, images and so forth.

Use

Using the parser is simple:

import hcard
import pprint

parser = hcard.MicroformatHCard(page_uri = 'http://tantek.com')
for d in parser.Iterate():
  pprint.pprint(d)

The ‘d’ returned is an extended python ‘dict’. Because we capture information about classes within paths, there’s no guarantee about how a key is going to be named. For example, a phone number could be keyed ‘tel’ or ‘tel.home’ (or a number of other things). Our dictionary ‘mfdict’ provides a number of functions called ‘find’ to pull out values. For example, this will pull out the least dot-specified telephone number:

tel = d.find('tel')

We also add special keys beginning with an ‘@’ for well known, additionally interesting or commonly used fields, to save you the trouble of figuring this information out yourself. Here’s an example parsed hCard (from the example above):

{'@html': u'<address id="hcard" class="vcard author"></address>',
 '@index': 'vcard-36',
 '@loose-uris': [u'http://tantek.com/'],
 '@parents': u'author copyright xoxo',
 '@title': u'Tantek \xc7elik',
 '@uf': 'hCard',
 '@uri': u'http://tantek.com#hcard',
 u'_url': '',
 u'adr.country-name': '',
 u'adr.locality': u'San Francisco',
 u'adr.region': u'CA',
 u'fn': u'Tantek \xc7elik',
 u'logo': u'icon-2007-128px.png',
 'n.family-name': u'\xc7elik',
 'n.given-name': u'Tantek',
 u'photo': u'http://tantek.com/icon-2007-128px.png',
 u'uid': u'Tantek \xc7elik',
 u'url': u'http://feeds.technorati.com/contact/tantek.com/%23hcard'}
Older Posts »

Powered by WordPress

Switch to our mobile site