Monday, 31 October 2011

Back with more data!

As announced by Jim Michalko on the OCLC research blog, we've launched another dataset!

Its yet more bib data, this time comprising over 600,000 records originating from Worldcat as RDF triples. We've also loaded most of this into our triplestore. OCLC have enhanced this data with links to the FAST and VIAF authority services.

Even better, the previous two datasets we released have also been enhanced with the same links. There are still some things that could be better, especially our vocab choices around VIAF expression, but the data is there.

This data is licensed under an ODC-by Attribution License and is one of the first to make use of OCLC's newly updated community norms (details here), their preference for licensing Worldcat data for reuse.

This is slightly in contrast to the pain free PDDL we've managed to provide so far, but we and OCLC are interested to see what users will make of this. The attribution is handled at a dataset level and should be relatively easy to implement and maintain

Dealing with attribution stacking was a major problem we encountered with COMET. That was partly due to Marc21s' inability to manage multiple record identifiers well and necessitated complex decision making regarding record ownership. Hopefully, the clear attribution policy set out here should be much easier to handle than the 'hobo stew' we encountered in our catalogue (as Jim puts it)!

I'd like to thank various folk at OCLC (especially our lead contact Eric Childress) for their support and patience over the past few months whilst we worked through a number of technical, administrative and legal points. They were voluntary partners on COMET but have given us a lot of time and assistance.

Next up, (when I find the time), will be enhanced links to Library of Congress subject headings and the recently released Name Authority File for everything in out triplestore.