Monday, 20 June 2011

On licensing ...

Licensing of bibliographic metadata is far too complex a subject.
One of the major aims of COMET has been to see how easy it is to identify records from major record vendors in the UK HE environment and address issues and concerns around data reuse. This work is still on-going, but its high time we got a post out on the subject, explaining where things are at.

Like most university libraries, Cambridge University Library relies heavily on external record vendors to meet its cataloging needs and keep up to speed with a high intake of material. Much of this data has its potential reuse and republication covered by an explicit contractual agreement. At the same time, we understand and support the need to produce Open Data as a platform for a better set of services for Higher Education.

State of play
Through the Comet Project we have been investigating our data for traces of 'ownership' and have been examining contracts. We've contacted the major record providers and some have indicated a preference for certain types of licenses in data re-publishing.
As an example, the British Library have published the British National bibliography as RDF formatted data under a PDDL and are happy for others holding BNB data in their catalogues to do the same,(although there is not yet any formal announcement to this effect!).

OCLC, perhaps the biggest record supplier have recently expressed a preference for ODC-By attribution licensing. We are one of a number of libraries working with OCLC to investigate the practicalities around this.

We in turn produce a substantial amount of data in-house, and would still like to publish this under a Public Domain Data License. Identifying this data was actually more difficult than it should be, we ourselves insert no 'made in Cambridge' label on our records, so we had to identify this set via a process of elimination.

Given this disparity between approaches to licensing, we will be aiming to produce several different datasets under established Open Data Commons licenses.

In terms of URI structure and vocab choice, they will be identical, but each whole set will be represented by a separate graph in our RDF datastore itself linked to the appropriate license information. For data produced under anything other than a PDDL, license information will also be made explicitly obvious to those downloading in bulk.

A final solution?
This area is still in flux. We feel that although licenses may vary, there should be no barrier to publishing data for others to reuse. We hope that over time, the library community will work to a set of established practices and community norms over data publishing. This work represents one of the first steps taken in this area.

Public Domain Data Licensing is an obvious ideal and one which we prefer, but adopting a pragmatic approach now can get more useful data out in the wild quickly. Whilst stepping back from PDDL or CC0 is next to impossible, adopting a slightly less open standard as an initial position which can be rethought downstream may be more palatable. Just steer clear of non-commercial licenses for data!

Marc21 - another reason for deviation
Whilst there is strong interest in and backing for Open Bibliographic Data within the international HE Library community, there have been concerns raised about its impact on organizations that rely on commercial Marc21 record supply to maintain and develop services.
We recognize that partner institutions have valid commercial interests in this and benefit ourselves from such services. As such, we are only releasing Marc21 that we can claim total ownership of. Other data is being released as RDF only. We believe our RDF output is sufficiently altered to make cross-walking it back to useful Marc21 next to impossible.

This may not be an approach suited to everyones' tastes, but it is pragmatic. To put this in perspective, how many open data consumers really care about Marc21? Its a format that really deserves to die and is irrelevant to the wider conversation.

Some of this post has been distilled down into a forthcoming F.A.Q for