We are continuing to finalize our RDf conversion and work through linking to OCLC resources. As we are also finalising the datasets we can make available under a permissive useful license, we are currently working of some random samples of catalogue data.
One issue worth highlighting at this stage is that of URI construction. URI's for records and other important entities described in a catalogue are a key component of linked data. We are taking a standards based approach to URI construction, trying to follow guidelines set out by the cabinet office for UK public sector (pdf link).
Our record URI string is quite simple:
the /id/entry/ denotes that the uri relates to an identifier for either a catalogue entry or entity described in our dataset. The following identifier string is a mixture of a string of characters for the dataset (which we may remove) and the catalogue records' identifier, already used in persistent URL's for our catalogue interface.
One issue we've not tackled is human readable unique identifiers for creators. The guild portion at the end constructed from a string of characters (say the 100$a in a Marc record) being stripped of punctuation (where errors tend to occur) and run through an MD5 checksum.
Human readable URI's would be nice, but some attempt at keeping this unique is probably better. If the Library of Congress were to follow suit on their excellent subject work and publish their name authority file as linked data, we could utilize and guids used there. Hopefully, we will be able to provide links to relevant VIAFF (Virtual International Authority File) entries for authors, where they can be matched by OCLC.
I'll follow this up shortly with a post about how we are ensuring the data behind a URI is easily referenced by both humans and machines.