The following guide is by Hugh Taylor, Head of Collection Description and Development at Cambridge University Library. The guide is still a work in progress and is being made available for comments and suggestions.
There are two key considerations underpinning the often vexed question of ownership of bibliographic data:
- Intellectual property
- Contracts and licences
The extent to which catalogue records can be regarded as covered, or not covered, by intellectual property law need not detain us directly (and will likely vary according to jurisdiction). But it cannot be ignored entirely, as will be seen in a moment.
Many of the records imported into the Cambridge ILS (the method of import is irrelevant) are supplied under the terms of a contract or licence taken out with the vendors concerned. Those contracts or licences typically make explicit what may or may not be done with the records obtained.
Giving (or selling) data to a third party may breach the terms of one or more of these contracts. So, even if it’s not possible to determine whether the data is subject to intellectual property law, it’s clear that its use is governed by specific terms in a number of contracts and licences.
What is less clear-cut, though, is how this situation is affected by work subsequently carried out locally on those records. How much local amendment can take place before it’s legitimate to regard this as a locally-prepared record, especially if the local amendments are in areas requiring significant intellectual effort (e.g. provision of subject headings or classification numbers)? This may be one place where contract and intellectual property law risk coming into conflict.
Generally speaking, it’s the supplier of a record who is regarded as the “owner”, but the more pairs of hands (or, more pertinently, systems) through which the data has passed the trickier it can be even to determine just who the original supplier of the data actually is (consider, for example, a Library of Congress record derived from a British National Bibliography record, downloaded and modified by the University of Nottingham, contributed by Nottingham to the RLUK database and then imported into Cambridge’s Voyager system – in contractual terms this is an “RLUK record” because it was contributed to the RLUK database by one of the consortium’s member libraries).
Reading the ownership of MARC 21 bibliographic records
Because of the difficulty of determining precisely ownership, and the need to avoid – having taken reasonable precautions – falling foul of the terms of the various contracts and licences taken out by the Library over the years, it’s important to be able to “read” Cambridge’s MARC 21 bibliographic records. This is key to a correct analysis of the data (or, if not correct, then “honest” – sometimes there is no single “correct” analysis).
One important thing to note is that much of the traffic of data between systems results in new identifiers being added to records. So a single record could have a number of different identifiers, each indicative of some system’s (or contributor’s) “involvement” in the life-cycle of that record. It is often impossible to work out what happened when, or in what order. The record becomes, in effect, an aggregation. In Cambridge we are as guilty (or should that be honest?!) as anyone, in that our record merge profile retains the main identifier fields from both records when two records are merged.
In the case of some identifiers, it’s not even clear what resource is being identified – is a Library of Congress Control Number (MARC 21 field 010) an identifier for that MARC record? Or is it an identifier for the resource described by that record? Given that the number has for many years been printed in each and every copy of a resource described by the record, it’s reasonable to take the line that it’s a manifestation identifier. In that case it tells us nothing concrete about the record itself. On the other hand, it could be interpreted as “data about data”, linked to the creation of that record.
These are the key fields which might be able to help us determine record ownership (and contractual and license obligations, therefore) – but note that this is itself a subjective interpretation:
010 Library of Congress Control Number
Because of its widespread use as a manifestation identifier, independent of where record creation takes place, it is probably unwise to rely on the presence of an 010 as indicative of anything relevant to this discussion.
015 National Bibliography Number
In theory the same argument applies to this field as to the 010; but in practice, it is rare to find records that incorporate 015 fields where the record itself did not originate from the national agency concerned (even if that record has undergone substantial modification subsequently). In theory, another cataloguing agency could copy the number from the relevant national bibliography into a locally-prepared record, but in my experience this would be highly unusual.
016 National Bibliographic Agency Control Number
This field is even more likely to have indicate that the record was prepared by the national agency concerned than is the case with 015. Note, though, that this field is found relatively infrequently (occurring in ca 1.5% of UL/Dependent Libraries records), and isn’t readily available for use in database queries (SQL etc) in the Cambridge Voyager system. Which also means it can’t feature in any duplicate detection profile.
035 System Control Number
According to the MARC 21 documentation, this field records the “control number of a system other than the one whose control number is contained in field 001, field 010 or field”. In practice, Voyager always moves incoming 001 fields to an 035 (paired with the code from the 003, if that was present in the source record). And no attempt is made to ensure there isn’t duplication between an 035 and 010/016, notwithstanding a strict reading of the field definition. As mentioned earlier, it is not uncommon to find a multiplicity of 035 fields as records pass from one system to another (half the records in the UL/Dependent Libraries database contain two or more 035 fields).
038 Record Content Licensor
This field contains the “MARC code of the organization that licenses the intellectual property rights to the data contained in the record, such as with contractual arrangements”. It was added to the MARC 21 format only in 2002, and of the major suppliers of data with whom Cambridge has a relationship, only RLUK has adopted it, which limits its value in the context of the COMET project.
040 Cataloging Source
Since this field contains, in a variety of subfields, codes covering the original cataloguing agency, the transcribing agency, and subsequent modifying agencies, it would seem to be an ideal source of information relating to “ownership” of data. In practice, though, none of these agencies may be in a position to claim such ownership, and even if they do they may have assigned rights to others. Use of a record created in OCLC by Library A and downloaded from WorldCat by Library B is constrained by the latter’s relationship with OCLC, not its relationship (if any) with Library A.
The frequency with which such fields are encountered locally depends on a number of factors, amongst which should be particularly kept in mind limitations on retention of data applying during the early years of development of Cambridge’s local library system.
Sources of data
The following is a list of (most of) the agencies that have acted as the immediate source of records found in the UL/Dependent Libraries database (it is possible that there are some additional ones used by individual Cambridge Union Catalogue members over the years.
- RLUK (formerly CURL) database – Cambridge was a founder member
- British National Bibliography and other records supplied by the British Library (including its BLAISE service) – the very first source of imported data used locally
- OCLC WorldCat
- Research Libraries Group – merged with OCLC in 2006 and its database absorbed into WorldCat
- National Institute of Informatics (NACSIS-CAT)
- Records from Chinese source(s)
- ESTC – bulkload acquired through CURL
Suppliers of MARC records specifically for digital objects:
- Serials Solutions (journals)
- Cambridge University Press
- NetLibrary
- [other ebook suppliers – this needs checking with S Stamford]
Businesses for whom supplying MARC records can be regarded as an adjunct to their primary activity as a “book vendor”:
- Aux Amateurs de Livres
- Books from Mexico
- Casalini Libri
- Coutts
- Garcia Cambeiro
- Otto Harrassowitz
- Retta
- Sulaiman
- Touzot
- Vientos Tropicales
Current state of licensing negotiations with major July 29th 2011
Supplier (print resources) | Summary |
Agency for the Legal Deposit Libraries | [Discussions taking place] |
Al-Muthanna Libr, Baghdad | OK to redistribute so long as we meet requirements |
Aux Amateurs de Livres | No restrictions |
Backstage Library Works (formerly Retro-link Associates; and then MARC Link) | No restrictions |
Books from Mexico | OK to redistribute so long as we meet requirements |
British Library | OK to redistribute so long as we meet requirements |
Casalini Libri | No restrictions |
Coutts | OK to redistribute so long as we meet requirements |
Coutts Nijhoff | Treat as for Coutts main account |
DK Agencies | No restrictions |
ESTC - UC Riverside | [Discussions taking place with RLUK] |
Garcia Cambeiro | No restrictions |
ISTC - British Library | OK to redistribute so long as we meet requirements |
Leila Books | No restrictions |
Luis A Retta | OK to redistribute so long as we meet requirements |
OCLC | OK to redistribute so long as we meet requirements |
Otto Harrassowitz | No restrictions |
Research Libraries Information Network (now incorp with OCLC) | OK to redistribute so long as we meet requirements |
RLUK | OK to redistribute so long as we meet requirements |
Serials Solutions (ProQuest) | Restrictions apply |
Sulaiman's Bookshop | OK to redistribute so long as we meet requirements |
Touzot | Restrictions apply |
| |
Supplier (online resources) | Summary |
American Council of Learned Societies | OK to redistribute so long as we meet requirements |
Cambridge Books Online | No restrictions |
Coutts (MyiLibrary) | [Not yet investigated] |
MEMSO | No restrictions |
NetLibrary | OK to redistribute so long as we meet requirements |
Oxford Univ Press | No restrictions |
Royal Society of Chemistry | [Discussions taking place via RLUK] |
Taylor & Francis | [Not yet investigated] |
Methodology
015/a – use this subfield as the basis for determining that a record originated in the BNB. Under the terms of CUL’s existing (but by now rather ancient) OCLC agreement, BNB records obtained from OCLC are covered by our BL user licence, and OCLC claims no rights in them. (This issue isn’t covered in the new WorldCat Rights and Responsibilities document.)
035/a – use this subfield as the general basis for determining the source(s) of a record (not necessarily the immediate source, though).
038/a – use this subfield to determine the record content licensor of a record.
As a rule of thumb, we can assume that the 015/a trumps what’s found in an 035 field – so it shouldn’t matter what the immediate source of a BNB record is, as the rights for that record remain with the BL. Similarly, the 038/a trumps any 035, since it’s an explicit statement concerning the identity of the licensor.
One complication here is that there is no access to the 038 in the Voyager tables, so identifying records containing that field is a rather long-winded process. For any initial data analysis we might want to ignore it and rely instead on the other identifiers (simply because it saves time and avoids the need for any scripting).
Without getting bogged down in the niceties of legal language, we should be able to assert the following:
All records with a BNB number in the 015/a belong to the BL
All records with an 038 belong to the licensor in the sub field a (but we might skip this stage initially – see above)
Of the remainder:
Each 035/a identifies an entity with some interest in the record – we should concern ourselves only with those identified above in the section on “Sources of data” (ignoring, therefore, entities such as the RLUK member who contributed a record to the RLUK database, from which it was then downloaded)
Various approaches could be adopted – what follows is just one (very simple) route, designed to help in the generation of ballpark statistics.
Initially, for the BNB records:
SELECT BIB_INDEX.BIB_ID
FROM BIB_INDEX INNER JOIN BIB_MASTER ON BIB_INDEX.BIB_ID = BIB_MASTER.BIB_ID
WHERE (((BIB_INDEX.INDEX_CODE)="015A") AND ((BIB_INDEX.NORMAL_HEADING) Like "GB*") AND ((BIB_MASTER.SUPPRESS_IN_OPAC)="N"))
GROUP BY BIB_INDEX.BIB_ID;
(NB: this is a little crude, in that there are a handful – and no more – of non-BNB records that start with “GB”)
And then for the 035/a (this example identifies only the RLUK identifier – it will need extending to cover the other data sources):
SELECT BIB_INDEX.BIB_ID
FROM BIB_INDEX INNER JOIN BIB_MASTER ON BIB_INDEX.BIB_ID = BIB_MASTER.BIB_ID
WHERE (((BIB_INDEX.INDEX_CODE)="0350") AND ((BIB_INDEX.NORMAL_HEADING) Like "UKLCURL*") AND ((BIB_MASTER.SUPPRESS_IN_OPAC)="N"))
GROUP BY BIB_INDEX.BIB_ID;
Those in the latter set that are also in the former can be disregarded (this could be JOINed into the query, of course, or both made into MAKE TABLE queries and the filtering done as a follow-on stage).
At a later stage this could doubtless be scripted, not only to include consideration of the 038, but also to reduce the number of stages involved and to end up with more exact results.
Update - a proposed workflow for a script based on this methodology is contained below. A sample Perl script is available at data.lib.cam.ac.uk
Statistics
These figures are based on the database as it was early on 15 Feb 2011 and are intended only to give an idea of the numbers of records which might fall into various categories. It covers only unsuppressed (publicly-visible) data.
Total unsuppressed bib records: 4179700
Instances of 010/a: 1100917
Instances of 015/a: 918754
Instances of 035/a: 3154928
Instances of 038/a: 64391
Analysis of 038/a:
CStRLIN 1224
DLC 13
OCoLC 12711
Uk 26215
UK-BiTAL 148
UK-BRIII 34
UK-LoTGL 26
UkLCURL 24019
Analysis of 035/a
Agency for the Legal Deposit Libraries | 1024 | |
Al-Muthanna Libr, Baghdad | contract | 193 |
American Academy in Rome | 3rd party | 179 |
American Philosophical Soc | 3rd party | 457 |
Aux Amateurs de Livres | contract | 3404 |
Backstage Library Works | contract | 30630 |
Bibl municipale de Lyon | 3rd party | 127 |
Bibl nationale (France) | 3rd party | 106 |
Books from Mexico | contract | 1624 |
Brigham Young Univ | 3rd party | 1557 |
British Library | contract or 3rd party or Z39.50 | 109128 |
Brown Univ | 3rd party | 714 |
C (generally) - no idea what this is | 606 | |
Canadian Centre for Architecture | 3rd party | 249 |
Casalini Libri | contract | 7010 |
CAT1/2 are not organization identifiers | 7447 | |
Columbia Univ | 3rd party | 6642 |
Cornell Univ | 3rd party | 5927 |
Coutts | contract | 2209 |
Coutts Nijhoff | contract | 507 |
Dartmouth College | 3rd party | 122 |
Deutsche Nationalbibliothek | 3rd party | 284 |
DK Agencies | contract | 2434 |
Duke Univ | 3rd party | 2399 |
Durham Univ Libr | 3rd party | 1376 |
Emory Univ | 3rd party | 1732 |
Erasmus | 3rd party | 1701 |
ESTC - British Library | Z39.50 or 3rd party | 1074 |
ESTC - UC Riverside | mostly RLUK - investigating contract | 48786 |
European Register of Microform and Digital Masters | 3rd party | 778 |
Florida State Univ | 3rd party | 375 |
Garcia Cambeiro (also AgBaFGC) | contract | 2597 |
Garcia Cambeiro (obsolete version of AR-BaFGC) | contract | 776 |
Georgetown Univ | 3rd party | 236 |
J Paul Getty Museum | 3rd party | 653 |
Harvard Univ, with various suffixes for individual parts (e.g. Law) - or these could be record id prefixes | 3rd party | 26985 |
Huntington Library | 3rd party | 1323 |
Iberbook - obsolete code | 3rd party | 146 |
Imperial College | 3rd party | 118 |
Imperial College (invalid code) | 3rd party | 202 |
International Institute of Social History, Amsterdam (corrupted form of code) | 3rd party | 287 |
International Institute of Social History, Amsterdam (obsolete code) | 3rd party | 647 |
invalid code - possibly Nat Libr of Australia | Z39.50 or 3rd party | 411 |
invalid code - clearly Coutts | contract | 149 |
invalid code - supplied on ACLS e-book records (Digital Libraries Initiative?) | e-books only | 2651 |
invalid code - presumably for Harvard's Hollis system | 3rd party | 232 |
invalid code - possible associated with Univ of Michigan | 3rd party | 197 |
invalid code - possibly associated with Brigham Young Univ | 3rd party | 190 |
invalid code - possibly used for New York Univ | 3rd party | 409 |
invalid code - clearly intended for OCLC | contract | 852 |
invalid code - clearly intended for Research Libraries Group (now absorbed into OCLC) | contract | 348279 |
invalid code - Russian, but beyond that?? | 3rd party | 292 |
invalid code - Russian, but beyond that?? | 3rd party | 653 |
invalid code - Bristol, but beyond that?? | 3rd party | 398 |
invalid code - perhaps old code for Nat Libr of Scotland | 3rd party | 499 |
ISTC - British Library | contract | 774 |
Istitute centrale per il catalogo unico… | 3rd party | 181 |
Iturriaga | 3rd party | 110 |
JRULM | 3rd party | 583 |
KCL | 3rd party | 1202 |
Keio Univ Library | 3rd party | 112 |
Ksiegarnia Wysylkowa Lexicon, Warsaw | 3rd party | 102 |
Leila Books | contract | 342 |
Libros Andinos (Bolivia) | 3rd party | 199 |
Libros Andinos (Bolivia) - variant of BO-CbLA | 3rd party | 13 |
Library and Archives Canada | Z39.50 or 3rd party | 2242 |
Library of Congress | Z39.50 or 3rd party | 134450 |
LSE | 3rd party | 2442 |
Luis A Retta | contract | 941 |
MARC Link (now Backstage) | contract | 45567 |
Metropolitan Museum of Art | 3rd party | 353 |
Nat Art Libr | 3rd party | 473 |
Nat Libr of New Zealand | Z39.50 or 3rd party | 268 |
Nat Libr of Scotland | Z39.50 or 3rd party | 21107 |
Nat Libr of Wales | Z39.50 or 3rd party | 4223 |
National Library of Medicine | 3rd party | 3294 |
New York Botanical Garden | 3rd party | 967 |
New York Public Library | 3rd party | 220 |
New York Univ | 3rd party | 2731 |
Nielsen Bookdata | 3rd party - but if derived from ALDL then rights may need to be checked | 196 |
Nijhoff (obsolete code) | 3rd party | 205 |
OCLC | contract | 658646 |
Oionos | 3rd party | 112 |
Otto Harrassowitz | contract | 26306 |
Oxford Univ, Bodleian | Z39.50 or 3rd party | 95096 |
Oxford Univ Press | contract | 124 |
Pierpont Morgan Libr | 3rd party | 156 |
Presumably LAC's AMICUS database, not code not valid | Z39.50 or 3rd party | 1338 |
Presumably RLUK (formerly CURL) | contract | 140 |
Princeton Univ | 3rd party | 3874 |
Puvill | 3rd party | 603 |
Puvill (invalid version)? | 3rd party | 78 |
Puvill (obsolete code) | 3rd party | 1758 |
Research Libraries Information Network (now incorp with OCLC) | contract | 299300 |
Retro-link Associates (now Backstage, via MARC Link) | contract | 27288 |
RLUK - now obsolete | contract | 962148 |
Rutgers Univ | 3rd party | 462 |
Saint Andrews Univ, Library | 3rd party - but may also be supplied by MEMSO under contract | 410 |
Schweizerische Nationalbibliothek | 3rd party | 1962 |
Serials Solutions (ProQuest) | contract | 52627 |
Smithsonian | 3rd party | 569 |
SOAS | 3rd party | 700 |
Stanford Univ | 3rd party | 6901 |
Sulaiman's Bookshop | contract | 695 |
SUNY Binghamton | 3rd party | 492 |
Swarthmore College | 3rd party | 290 |
Syracuse Univ | 3rd party | 2276 |
Temple Univ | 3rd party | 233 |
Touzot | contract | 2997 |
Touzot | contract | 2180 |
Trinity College Dublin | 3rd party | 9157 |
UBS Publishers' Distributors, New Delhi | 3rd party | 183 |
UC Berkeley | Z39.50 or 3rd party | 832 |
UC Berkeley | Z39.50 or 3rd party | 624 |
UC Berkeley | Z39.50 or 3rd party | 566 |
UC Santa Barbara | Z39.50 or 3rd party | 409 |
UCL | 3rd party | 725 |
Univ of Birmingham | 3rd party | 611 |
Univ of Bristol | 3rd party | 251 |
Univ of Chicago | 3rd party | 7540 |
Univ of Edinburgh | 3rd party | 956 |
Univ of Florida | 3rd party | 771 |
Univ of Glasgow | 3rd party | 1509 |
Univ of Iowa | 3rd party | 506 |
Univ of Leeds | 3rd party | 857 |
Univ of Liverpool | 3rd party | 197 |
Univ of London | 3rd party | 1648 |
Univ of Michigan | 3rd party | 2841 |
Univ of Minnesota | 3rd party | 106 |
Univ of Minnesota | 3rd party | 6997 |
Univ of Newcastle upon Tyne | 3rd party | 172 |
Univ of Nottingham | 3rd party | 667 |
Univ of Pennsylvania | 3rd party | 1964 |
Univ of Rochester | 3rd party | 574 |
Univ of Sheffield | 3rd party | 534 |
Univ of Southampton | 3rd party | 305 |
Univ of Southern California | Z39.50 or 3rd party | 328 |
Univ of Tennessee | 3rd party | 99 |
Univ of Warwick | 3rd party | 553 |
UTLAS | 3rd party | 2715 |
Vientos Tropicales | contract | 157 |
W.H. Everett | 3rd party | 271 |
Washington Library Network (merged into OCLC) | contract | 1372 |
Wellcome | 3rd party | 178 |
Yale Univ, with various suffixes for individual parts (e.g. Law) | 3rd party | 11079 |
Hugh Taylor - 28th July 2011