Tuesday 19 July 2011

Cost benefits

The JISC has asked us to blog on cost benefits of providing open data. I'll give a rough indication costs based on time spent and an idea of what I think the benefits may be with an indication of how the two weigh up.

Costs:

1) Marc21 Data 'ownership' analysis - ( 5 days staff time at SP64)
Mapping and conversion of bibliographic information. An experimental and iterative process.

2) Marc21 to RDF data conversion - ( x2 developers at SP53)
Again, this has been drawn out through experimental work. Several methods and iterations were tried. Those aiming to repeat this may not incur the same cost.

3) Web infrastructure development and record curation- (x2 developers at SP53)
A lightweight approach to development was taken using existing application frameworks. Time was also spent understanding underlying principles of RDF stores and associated best practice for linked data. Several iterative loads of data were undertaken in parallel with Marc to RDF conversion.

4) Hosting and sustainability costs - costs tbc
COMET's web infrastructure makes use of existing VM and MYSQL infrastructure at CARET, so additional infrastructure costs were negligible and hard to determine. We've promised to keep the service running for a year.

5) Other stuff
Project management etc.

External benefits:
  • Substantial contribution to Open Bibliography - Open data is arguably a good thing, and whilst it has flaws, ours is hopefully useful enough to be useful to others in its own right
  • Clarification on licensing agreements with record vendors - Much headway has been made into this issue by the COMET project, with some clarification on licensing preferences for RDF data from three major UK record Vendors, OCLC, RLUK and the British Library. Down the line, we hope that these organizations will formalize their agreements with us so that others can benefit, which will hopefully help in publishing more data
  • Advice on how to analyise records to determine 'ownership' and lightweight (Perl, PHP, MYSQL based) tools to create and publish RDF linked data from Marc21
  • Experiments with FAST and VIAF - Two potentially useful data sources

In house benefits:
  • Community interaction - There is strong interest in Open Bibliography an its benefits. The University Library has also benefited greatly from its interaction with the open and linked data communities, in its work with OCLC and with others through the JISC Discovery program
  • In house skills - We've gained vital in-house understanding of the design and publication of RDF. We've developed basic training materials around SPARQL for non-developers, which could play off down the line

Summary:
External benefits clearly outweigh internal benefits, although as external benefits affect the whole library community, they also benefit us!

Whats' clear is that Open Data is not free data, at least not to us. We could have simply dumped our Marc21 or Dublin core XML and have been done with it, and for many that would have sufficed.

Instead, combining our wish to publish more Open Data with a need to learn about Linked Data (and thus lashing two fast bandwagons nicely together) has pushed the costs far higher.

However, by publishing linked data we've hopefully made our output more useful to a wider community than library metadata specialists, and in that sense added value.

More data being published means greater community feedback to draw upon, which should result in lower costs for those repeating this exercise.

It may indeed be several development cycles before we or others fully reap the benefits of this work. Alternatively, things could move in a different direction, with RDF based linked data falling by the wayside in favour of more accessible mechanisms for record sharing, in which case, our work could be useful in avoiding mistakes.

No comments:

Post a Comment