Thursday 19 May 2011

Small (but fiddly) win for URI's ...


Work on RDF conversion goes on. In addition to eventual complete dumps of data, we've also started putting together the pieces for our application to support RDF queries via SPARQL and HTTPD.


We are using the apache extension mod_rewrite to turn human readable uris like the below ...


http://data.lib.cam.ac.uk/id/entry/cul_comet_pddl_4589705


into those easily parsed by the web application dishing out the record content:


http://data.lib.cam.ac.uk/record.php?uri=http://data.lib.cam.ac.uk/id/entry/cul_comet_pddl_4589705&format=html



Its also considered best practice with linked data to provide dish up records in the format required by the requesting agent in their httpd request. This practice is referred to as 'cool uri's'. As an example, if I want to view 'http://data.lib.cam.ac.uk/id/entry/cul_comet_pddl_4589705' in a browser, when the standard http request accepts content returned as 'text/html', then they should see html in their browser.











Conversely, if they want to see rdf+xml content, they make request it via a script or command line, e.g:



curl -H "Accept: application/rdf+xml" http://data.lib.cam.ac.uk/id/entry/cul_comet_pddl_4589705


They should not have to add any kind of file extension (.rdf) to the request uri, although its also nice to support this.


We can handle this within the web application framework, which will involve monitoring requests and parsing incoming uri strings for file extensions, but that will add precious lines of code. Much easier to let the web server take over, which is where mod_rewite again comes in. It allows you to specify a set of rules that monitor for file extensions and accepted content types and rework uri's so a web application can dish out the required format


Sadly, we can't escape regular expressions, and mod_rewite relies heavily on them. Logging is also essential for de-bugging. Here is our htaccess ruleset, with each rule commented. We are still not supporting all the formats available for RDF distribution, but sticking to xml, json, baseline triples and turtle.

Mod_rewite or equivalent tools are a vital part of semantic web infrastructure, and whilst fiddly, a little knowledge can go a long way. Here are three great tutorials:


1 comment: