Tuesday 5 July 2011

Two more updates ...

After this weeks' launch of data.lib.cam.ac.uk, its good to follow up with some more updates.

First up is a SPARQL workshop, a three-part tutorial on RDF and the SPARQL query language aimed at I.T. workers with little to no knowledge of the semantic web, technically minded librarians, web designers and those with an interest in metadata and its (re)use.

One of the primary (and justifiable) criticisms of RDF is the high entry barrier. Much of the literature assumes a high level of technical and semantic web knowledge.

In an attempt to 'help others follow in our footsteps', I've tried to represent the learning done by myself using SPARQL to query our dataset. This may not actually lower the entry barrier, but will hopeful provide those with an interest in RDF with a base-line starting place.

Secondly, we are beginning to better link our Linked Data!

we've made some experimental gains in URI enrichment, supplementing our graphs for catalogue subject entries with links the the Library of Congress vocabularies.

See these examples:

http://data.lib.cam.ac.uk/id/entry/cambrdgedb_c1574b4e36a34f04bda61b3ea57b2379

http://data.lib.cam.ac.uk/id/entry/cambrdgedb_2ca5328ca9bebe20f37a7718d5e1f67b

http://data.lib.cam.ac.uk/id/entry/cambrdgedb_2883408d7b714bb6423d5c1ebcb40a48


As our labels are made up of a number of Library of Congress subject components, (subject, Linkgeographic, chronological etc) we are taking the inital main entry and representing it with a 'skos:broader' vocab. We would love some feedback on this approach, which is little more than a starting point. As we are using http requests against the id.loc.gov service, we are also running into scaling issues with our 600,000 + subject entries.

Enrichment is being done directly in our RDF store, so for now this is not being reflected in our bulk data downloads for now.

1 comment:

  1. To update on this post, we've enriched many subject nodes with links to LOC, using the DC hasPart predicate.

    Currently, only the initial entry in a subject area is being matched, but we pay try other parts.

    ReplyDelete