Aims, objectives and target outputs (see project plan for specific deliverables)
1) Further contribution to Open Bibliography
1) Further contribution to Open Bibliography
- Cambridge University Library’s Voyager catalogue contains approximately 15 million bibliographic records including records derived from OCLC, RLUK and the British Library in addition to locally-created records. The initial aim of this project will be to identify and release a substantial record set to an external platform under an open license (Public Domain Dedication License) as MARC 21
- For OCLC-derived bibliographic records data will be released in a fashion compliant with their WorldCat Rights and Responsibilities for the OCLC Cooperative. 3 The inclusion of OCLC metadata brings the number of records which will be released to over 2,200,000
- The project aims to then deploy and test and number of technologies and methodologies for releasing open bibliographic data including XML, RDF, SPARQL, and JSON.
- It will investigate use of the Open Knowledge Foundation’s ORDF Python library4 built upon RDFLib for working with RDF Data as a platform for sharing data
- This project will also examine ways to link the above output to OCLC’s data enrichment services in assigning FAST and VIAF headings. It also hopes to examine other linking mechanisms and opportunities as they arise throughout the project
- Additional metatadata derived from Cambridge University Library published under a Public Domain Dedication License as MARC-21
- The above dataset accessable As RDF/XML and other notation
- Data derived from OCLC released in a similar fashion under a license compliant with the rights and responsibilites of the co-operative
- A working RDF / triplestore with SPARQL endpoint containing the above data
- Full documentation on work described above
- A full investigation into examing and determining provenance in MARC-21 data for the UK HE community
- An investigation of the licensing issues for data, relevent to library operation
Wider Benefits to Sector & Achievements for Host Institution
The project will bring value to the wider community by contributing substantially to the implementation of the Resource Discovery Task Force vision of open metadata through release of over 2.2 million bibliographic records under an open licence. The release of this volume of data will itself be of value to the community but the engagement of a major content provider in this process and in facilitating data linking brings added value to the project.
An investigation into linking to FAST and VIAF headings will provide an exemplar of the potential usefulness of a structured semantic approach to data. The project will look at the value data enrichment offers for resource discovery in the context of the RDTF vision.
Staff of both the University Library and CARET will develop further skills in dealing with structured data, testing alternative technologies and approaches: JSON and XML, RDF/SPARQL. Involvement of both library and VLE developers will test the potential of open metadata and linked data in two different communities holding distinct and useful sets of user data.
The project team have an excellent record of documenting and presenting their work in a way which encourages take-up by other institutions that will be applied to this project.
Examples of this approach can be found at the library’s API developers' portal and at the CUL widgets development.
Risk Analysis and Success Plan
Risk | Probability (1-5) | Severity (1-5) | Score (P x S) | Action to Prevent/Manage Risk |
Staffing | ||||
Retention of project staff | 1 | 3 | 3 | Specialists with similar range of skills available within existing teams. |
Organisational | ||||
Failure to meet schedule in workplan | 1 | 4 | 4 | Experienced project director and board will be appointed |
Technical | ||||
Problems with metadata linking | 1 | 2 | 2 | Able to call upon OCLC for advice and assistance with linking to FAST and VIAF headings and other Cambridge expertise where required. |
Legal | ||||
Intellectual property rights | 2 | 3 | 6 | University Legal Services team will be consulted where required. The team will observe JISC and OCLC advice on IPR issues in relation to metadata release. |
Licenses cannot be obtained to permit intended use | 1 | 3 | 3 | Project takes into account scope of existing licenses and previous work with IPR holders. |
Breach of permitted uses | 1 | 4 | 4 | Take-down policy |
IPR issues
- Data deemed to be wholly owned by Cambridge University Library will be released under an Open Data commons Public Domain Dedication License
- Data and enrichment services sourced from OCLC will be provided in a fashion compliant with the WorldCat Rights and Responsibilities for the OCLC Cooperative
- All documentation will be made available under a Creative Commons license
- All code outputs will be made available under the GPL compatable Apache license v 2.0
Project Team Relationships and End User Engagement
The project will not recruit new staff but will make use of existing staff in the Library and CARET.
The project will not recruit new staff but will make use of existing staff in the Library and CARET.
Ed Chamberlain, Systems Development Librarian, will contribute 0.5 FTE for technical work and will undertake management of the project. He brings extensive experience of project management on cross-departmental projects, particularly with the CARET and other libraries of the university, and was responsible for releasing and documenting existing APIs to library services.
Dan Sheppard, Senior Research Associate at CARET will also contribute 0.5 FTE as software developer. The project will also call upon the expertise of two further members of library staff:
Hugh Taylor, Head of Collection Development and Description, for bibliographic data and related data ownership and licensing issues, and Huw Jones, Digital Library Metadata Specialist, who will contribute particularly towards identifying and evaluating service innovations based on the open metadata and linking to library location and membership data.
A project board, comprising of representatives of Cambridge University Library and the
University of Cambridge Centre for Applied Research in Educational Technologies will take responsibility for overseeing the project:
Patricia Killiard, Head of Electronic Services and Systems, Cambridge University Library
Hugh Taylor, Head of Collection Development and Description, Cambridge University Library
John Norman, Director, Centre for Applied Research in Educational Technologies, University of Cambridge
The team will engage end users primarily through this blog as an inital entry point to a deeper set of online documentation hosted at lib.cam.ac.uk/api. This will follow on from the documentation created by the JISC funded Cambrdge widgets project
Projected Timeline, Workplan, Deliverables & Overall Project Methodology
Time-frame | ||
WP1 Project management and communication | ||
Writing a detailed project plan. Creation of a project blog to be updated regularly. Documenting technical approaches, methodologies, solutions, and problems. | o Project plan, o Project blog with notes of meetings, team discussions, and final project report. o Documentation on technologies and methodologies. | February-July 2011 |
WP2 Data release | ||
o Export of bibliographic data from Cambridge University Library Voyager catalogue to an external data store with appropriate API and semantic interoperability. o Investigate use of the Open Knowledge Foundation’s ORDF Python library built upon RDFLib for working with RDF Data. | o Set of around 2 million bibliographic records openly available as structured data XML/RDF, JSON/SPARQL o Skills development of team in above technologies o Skills development of team around use of the ORDF Python library | February-April 2011 |
WP3 Linking to OCLC FAST and VIAF headings | ||
o Assign FAST and VIAF headings to open metadata using linked approach. o Use of OCLC’s Linked Data Framework | o Enriched metadata set with FAST and VIAF o Experience in linking to OCLC/external data and use of the framework | April-June 2011 |
WP4 Linking to library location and membership data | ||
|
| April-June 2011 |
WP5 Intellectual property rights | ||
|
| February-June 2011 |
WP6 Evaluation and Sustainability | ||
|
| April-July 2011 |
Budget
Directly Incurred Staff | August 10– July 11 | August 11– July 12 | TOTAL £ | ||
Total Directly Incurred Staff (A) | £32,042 | £0 | £32,042 | ||
Non-Staff | August 10– July 11 | August 11– July 12 | TOTAL £ | ||
Travel and expenses | £800 | £0 | £800 | ||
Hardware/software | £500 | £0 | £500 | ||
Dissemination | £500 | £0 | £500 | ||
Other – Contingency | £1,000 | £0 | £1,000 | ||
Total Directly Incurred Non-Staff (B) | £2,800 | £0 | £2,800 | ||
Directly Incurred Total (C) (A+B=C) | £34,842 | £0 | £34,842 | ||
Directly Allocated | August 10– July 11 | August 11– July 12 | TOTAL £ | ||
Academic Grade 11 - UL, sp64, 5 days | £1,684 | £0 | £1,684 | ||
Academic Grade 12 - CARET, sp69, 5 days | £1,960 | £0 | £1,960 | ||
Academic Grade 7 – UL, sp46, 5 days | £975 | £0 | £975 | ||
Estates | £5,655 | £0 | £5,655 | ||
Directly Allocated Total (D) | £10,274 | £0 | £10,274 | ||
Indirect Costs (E) | £38,628 | £0 | £38,628 | ||
Total Project Cost (C+D+E) | £83,744 | £ | £83,744 | ||
Amount Requested from JISC | £40,000 | £ | £40,000 | ||
Institutional Contributions | £43,744 | £ | £43,744 | ||
Percentage Contributions over the life of the project | JISC 48% | Partners 52% | Total 100% | ||
No. FTEs used to calculate indirect and estates charges, and staff included | 1 FTE | All Directly incurred Staff | |||