"Ownership" of MARC-21 records

The following guide is by Hugh Taylor, Head of Collection Description and Development at Cambridge University Library. The guide is still a work in progress and is being made available for comments and suggestions.
There are two key considerations underpinning the often vexed question of ownership of bibliographic data:

  1. Intellectual property
  2. Contracts and licences

The extent to which catalogue records can be regarded as covered, or not covered, by intellectual property law need not detain us directly (and will likely vary according to jurisdiction). But it cannot be ignored entirely, as will be seen in a moment.

Many of the records imported into the Cambridge ILS (the method of import is irrelevant) are supplied under the terms of a contract or licence taken out with the vendors concerned. Those contracts or licences typically make explicit what may or may not be done with the records obtained.

Giving (or selling) data to a third party may breach the terms of one or more of these contracts. So, even if it’s not possible to determine whether the data is subject to intellectual property law, it’s clear that its use is governed by specific terms in a number of contracts and licences.

What is less clear-cut, though, is how this situation is affected by work subsequently carried out locally on those records. How much local amendment can take place before it’s legitimate to regard this as a locally-prepared record, especially if the local amendments are in areas requiring significant intellectual effort (e.g. provision of subject headings or classification numbers)? This may be one place where contract and intellectual property law risk coming into conflict.

Generally speaking, it’s the supplier of a record who is regarded as the “owner”, but the more pairs of hands (or, more pertinently, systems) through which the data has passed the trickier it can be even to determine just who the original supplier of the data actually is (consider, for example, a Library of Congress record derived from a British National Bibliography record, downloaded and modified by the University of Nottingham, contributed by Nottingham to the RLUK database and then imported into Cambridge’s Voyager system – in contractual terms this is an “RLUK record” because it was contributed to the RLUK database by one of the consortium’s member libraries).

Reading the ownership of MARC 21 bibliographic records
Because of the difficulty of determining precisely ownership, and the need to avoid – having taken reasonable precautions – falling foul of the terms of the various contracts and licences taken out by the Library over the years, it’s important to be able to “read” Cambridge’s MARC 21 bibliographic records. This is key to a correct analysis of the data (or, if not correct, then “honest” – sometimes there is no single “correct” analysis).

One important thing to note is that much of the traffic of data between systems results in new identifiers being added to records. So a single record could have a number of different identifiers, each indicative of some system’s (or contributor’s) “involvement” in the life-cycle of that record. It is often impossible to work out what happened when, or in what order. The record becomes, in effect, an aggregation. In Cambridge we are as guilty (or should that be honest?!) as anyone, in that our record merge profile retains the main identifier fields from both records when two records are merged.

In the case of some identifiers, it’s not even clear what resource is being identified – is a Library of Congress Control Number (MARC 21 field 010) an identifier for that MARC record? Or is it an identifier for the resource described by that record? Given that the number has for many years been printed in each and every copy of a resource described by the record, it’s reasonable to take the line that it’s a manifestation identifier. In that case it tells us nothing concrete about the record itself. On the other hand, it could be interpreted as “data about data”, linked to the creation of that record.

These are the key fields which might be able to help us determine record ownership (and contractual and license obligations, therefore) – but note that this is itself a subjective interpretation:

010      Library of Congress Control Number
Because of its widespread use as a manifestation identifier, independent of where record creation takes place, it is probably unwise to rely on the presence of an 010 as indicative of anything relevant to this discussion.

015      National Bibliography Number
In theory the same argument applies to this field as to the 010; but in practice, it is rare to find records that incorporate 015 fields where the record itself did not originate from the national agency concerned (even if that record has undergone substantial modification subsequently). In theory, another cataloguing agency could copy the number from the relevant national bibliography into a locally-prepared record, but in my experience this would be highly unusual.

016      National Bibliographic Agency Control Number
This field is even more likely to have indicate that the record was prepared by the national agency concerned than is the case with 015. Note, though, that this field is found relatively infrequently (occurring in ca 1.5% of UL/Dependent Libraries records), and isn’t readily available for use in database queries (SQL etc) in the Cambridge Voyager system. Which also means it can’t feature in any duplicate detection profile.

035      System Control Number
According to the MARC 21 documentation, this field records the “control number of a system other than the one whose control number is contained in field 001, field 010 or field”. In practice, Voyager always moves incoming 001 fields to an 035 (paired with the code from the 003, if that was present in the source record). And no attempt is made to ensure there isn’t duplication between an 035 and 010/016, notwithstanding a strict reading of the field definition. As mentioned earlier, it is not uncommon to find a multiplicity of 035 fields as records pass from one system to another (half the records in the UL/Dependent Libraries database contain two or more 035 fields).

038      Record Content Licensor
This field contains the “MARC code of the organization that licenses the intellectual property rights to the data contained in the record, such as with contractual arrangements”. It was added to the MARC 21 format only in 2002, and of the major suppliers of data with whom Cambridge has a relationship, only RLUK has adopted it, which limits its value in the context of the COMET project.

040      Cataloging Source
Since this field contains, in a variety of subfields, codes covering the original cataloguing agency, the transcribing agency, and subsequent modifying agencies, it would seem to be an ideal source of information relating to “ownership” of data. In practice, though, none of these agencies may be in a position to claim such ownership, and even if they do they may have assigned rights to others. Use of a record created in OCLC by Library A and downloaded from WorldCat by Library B is constrained by the latter’s relationship with OCLC, not its relationship (if any) with Library A.

The frequency with which such fields are encountered locally depends on a number of factors, amongst which should be particularly kept in mind limitations on retention of data applying during the early years of development of Cambridge’s local library system.

Sources of data
The following is a list of (most of) the agencies that have acted as the immediate source of records found in the UL/Dependent Libraries database (it is possible that there are some additional ones used by individual Cambridge Union Catalogue members over the years.

  • RLUK (formerly CURL) database – Cambridge was a founder member
  • British National Bibliography and other records supplied by the British Library (including its BLAISE service) – the very first source of imported data used locally
  • OCLC WorldCat
  • Research Libraries Group – merged with OCLC in 2006 and its database absorbed into WorldCat
  • National Institute of Informatics (NACSIS-CAT)
  • Records from Chinese source(s)
  • ESTC – bulkload acquired through CURL

Suppliers of MARC records specifically for digital objects:
  • Serials Solutions (journals)
  • Cambridge University Press
  • NetLibrary
  • [other ebook suppliers – this needs checking with S Stamford]

Businesses for whom supplying MARC records can be regarded as an adjunct to their primary activity as a “book vendor”:
  • Aux Amateurs de Livres
  • Books from Mexico
  • Casalini Libri
  • Coutts
  • Garcia Cambeiro
  • Otto Harrassowitz
  • Retta
  • Sulaiman
  • Touzot
  • Vientos Tropicales

Current state of licensing negotiations with major July 29th 2011

Supplier (print resources)
Agency for the Legal Deposit Libraries
[Discussions taking place]
Al-Muthanna Libr, Baghdad
OK to redistribute so long as we meet requirements
Aux Amateurs de Livres
No restrictions
Backstage Library Works (formerly Retro-link Associates; and then MARC Link)
No restrictions
Books from Mexico
OK to redistribute so long as we meet requirements
British Library
OK to redistribute so long as we meet requirements
Casalini Libri
No restrictions
OK to redistribute so long as we meet requirements
Coutts Nijhoff
Treat as for Coutts main account
DK Agencies
No restrictions
ESTC - UC Riverside
[Discussions taking place with RLUK]
Garcia Cambeiro
No restrictions
ISTC - British Library
OK to redistribute so long as we meet requirements
Leila Books
No restrictions
Luis A Retta
OK to redistribute so long as we meet requirements
OK to redistribute so long as we meet requirements
Otto Harrassowitz
No restrictions
Research Libraries Information Network (now incorp with OCLC)
OK to redistribute so long as we meet requirements
OK to redistribute so long as we meet requirements
Serials Solutions (ProQuest)
Restrictions apply
Sulaiman's Bookshop
OK to redistribute so long as we meet requirements
Restrictions apply

Supplier (online resources)
American Council of Learned Societies
OK to redistribute so long as we meet requirements
Cambridge Books Online
No restrictions
Coutts (MyiLibrary)
[Not yet investigated]
No restrictions
OK to redistribute so long as we meet requirements
Oxford Univ Press
No restrictions
Royal Society of Chemistry
[Discussions taking place via RLUK]
Taylor & Francis
[Not yet investigated]

015/a – use this subfield as the basis for determining that a record originated in the BNB. Under the terms of CUL’s existing (but by now rather ancient) OCLC agreement, BNB records obtained from OCLC are covered by our BL user licence, and OCLC claims no rights in them. (This issue isn’t covered in the new WorldCat Rights and Responsibilities document.)

035/a – use this subfield as the general basis for determining the source(s) of a record (not necessarily the immediate source, though).

038/a – use this subfield to determine the record content licensor of a record.

As a rule of thumb, we can assume that the 015/a trumps what’s found in an 035 field – so it shouldn’t matter what the immediate source of a BNB record is, as the rights for that record remain with the BL. Similarly, the 038/a trumps any 035, since it’s an explicit statement concerning the identity of the licensor.

One complication here is that there is no access to the 038 in the Voyager tables, so identifying records containing that field is a rather long-winded process. For any initial data analysis we might want to ignore it and rely instead on the other identifiers (simply because it saves time and avoids the need for any scripting).

Without getting bogged down in the niceties of legal language, we should be able to assert the following:

All records with a BNB number in the 015/a belong to the BL
All records with an 038 belong to the licensor in the sub field a (but we might skip this stage initially – see above)
Of the remainder:
            Each 035/a identifies an entity with some interest in the record – we should concern ourselves only with those identified above in the section on “Sources of data” (ignoring, therefore, entities such as the RLUK member who contributed a record to the RLUK database, from which it was then downloaded)

Various approaches could be adopted – what follows is just one (very simple) route, designed to help in the generation of ballpark statistics.

Initially, for the BNB records:
(NB: this is a little crude, in that there are a handful – and no more – of non-BNB records that start with “GB”)

And then for the 035/a (this example identifies only the RLUK identifier – it will need extending to cover the other data sources):

Those in the latter set that are also in the former can be disregarded (this could be JOINed into the query, of course, or both made into MAKE TABLE queries and the filtering done as a follow-on stage).

At a later stage this could doubtless be scripted, not only to include consideration of the 038, but also to reduce the number of stages involved and to end up with more exact results.

Update - a proposed workflow for a script based on this methodology is contained below. A sample Perl script is available at data.lib.cam.ac.uk

These figures are based on the database as it was early on 15 Feb 2011 and are intended only to give an idea of the numbers of records which might fall into various categories. It covers only unsuppressed (publicly-visible) data.

Total unsuppressed bib records:            4179700
Instances of 010/a:                                1100917
Instances of 015/a:                                  918754
Instances of 035/a:                                3154928
Instances of 038/a:                                    64391

Analysis of 038/a:
CStRLIN         1224
DLC                13
OCoLC           12711
Uk                   26215
UK-BiTAL      148
UK-BRIII        34
UK-LoTGL     26
UkLCURL       24019

Analysis of 035/a

Agency for the Legal Deposit Libraries
Al-Muthanna Libr, Baghdad contract 193
American Academy in Rome 3rd party 179
American Philosophical Soc 3rd party 457
Aux Amateurs de Livres contract 3404
Backstage Library Works contract 30630
Bibl municipale de Lyon 3rd party 127
Bibl nationale (France) 3rd party 106
Books from Mexico contract 1624
Brigham Young Univ 3rd party 1557
British Library contract or 3rd party or Z39.50 109128
Brown Univ 3rd party 714
C (generally) - no idea what this is
Canadian Centre for Architecture 3rd party 249
Casalini Libri contract 7010
CAT1/2 are not organization identifiers
Columbia Univ 3rd party 6642
Cornell Univ 3rd party 5927
Coutts contract 2209
Coutts Nijhoff contract 507
Dartmouth College 3rd party 122
Deutsche Nationalbibliothek 3rd party 284
DK Agencies contract 2434
Duke Univ 3rd party 2399
Durham Univ Libr 3rd party 1376
Emory Univ 3rd party 1732
Erasmus 3rd party 1701
ESTC - British Library Z39.50 or 3rd party 1074
ESTC - UC Riverside mostly RLUK - investigating contract 48786
European Register of Microform and Digital Masters 3rd party 778
Florida State Univ 3rd party 375
Garcia Cambeiro (also AgBaFGC) contract 2597
Garcia Cambeiro (obsolete version of AR-BaFGC) contract 776
Georgetown Univ 3rd party 236
J Paul Getty Museum 3rd party 653
Harvard Univ, with various suffixes for individual parts (e.g. Law) - or these could be record id prefixes 3rd party 26985
Huntington Library 3rd party 1323
Iberbook - obsolete code 3rd party 146
Imperial College 3rd party 118
Imperial College (invalid code) 3rd party 202
International Institute of Social History, Amsterdam (corrupted form of code) 3rd party 287
International Institute of Social History, Amsterdam (obsolete code) 3rd party 647
invalid code - possibly Nat Libr of Australia Z39.50 or 3rd party 411
invalid code - clearly Coutts contract 149
invalid code - supplied on ACLS e-book records (Digital Libraries Initiative?) e-books only 2651
invalid code - presumably for Harvard's Hollis system 3rd party 232
invalid code - possible associated with Univ of Michigan 3rd party 197
invalid code - possibly associated with Brigham Young Univ 3rd party 190
invalid code - possibly used for New York Univ 3rd party 409
invalid code - clearly intended for OCLC contract 852
invalid code - clearly intended for Research Libraries Group (now absorbed into OCLC) contract 348279
invalid code - Russian, but beyond that?? 3rd party 292
invalid code - Russian, but beyond that?? 3rd party 653
invalid code - Bristol, but beyond that?? 3rd party 398
invalid code - perhaps old code for Nat Libr of Scotland 3rd party 499
ISTC - British Library contract 774
Istitute centrale per il catalogo unico… 3rd party 181
Iturriaga 3rd party 110
JRULM 3rd party 583
KCL 3rd party 1202
Keio Univ Library 3rd party 112
Ksiegarnia Wysylkowa Lexicon, Warsaw 3rd party 102
Leila Books contract 342
Libros Andinos (Bolivia) 3rd party 199
Libros Andinos (Bolivia) - variant of BO-CbLA 3rd party 13
Library and Archives Canada Z39.50 or 3rd party 2242
Library of Congress Z39.50 or 3rd party 134450
LSE 3rd party 2442
Luis A Retta contract 941
MARC Link (now Backstage) contract 45567
Metropolitan Museum of Art 3rd party 353
Nat Art Libr 3rd party 473
Nat Libr of New Zealand Z39.50 or 3rd party 268
Nat Libr of Scotland Z39.50 or 3rd party 21107
Nat Libr of Wales Z39.50 or 3rd party 4223
National Library of Medicine 3rd party 3294
New York Botanical Garden 3rd party 967
New York Public Library 3rd party 220
New York Univ 3rd party 2731
Nielsen Bookdata 3rd party - but if derived from ALDL then rights may need to be checked 196
Nijhoff (obsolete code) 3rd party 205
OCLC contract 658646
Oionos 3rd party 112
Otto Harrassowitz contract 26306
Oxford Univ, Bodleian Z39.50 or 3rd party 95096
Oxford Univ Press contract 124
Pierpont Morgan Libr 3rd party 156
Presumably LAC's AMICUS database, not code not valid Z39.50 or 3rd party 1338
Presumably RLUK (formerly CURL) contract 140
Princeton Univ 3rd party 3874
Puvill 3rd party 603
Puvill (invalid version)? 3rd party 78
Puvill (obsolete code) 3rd party 1758
Research Libraries Information Network (now incorp with OCLC) contract 299300
Retro-link Associates (now Backstage, via MARC Link) contract 27288
RLUK - now obsolete contract 962148
Rutgers Univ 3rd party 462
Saint Andrews Univ, Library 3rd party - but may also be supplied by MEMSO under contract 410
Schweizerische Nationalbibliothek 3rd party 1962
Serials Solutions (ProQuest) contract 52627
Smithsonian 3rd party 569
SOAS 3rd party 700
Stanford Univ 3rd party 6901
Sulaiman's Bookshop contract 695
SUNY Binghamton 3rd party 492
Swarthmore College 3rd party 290
Syracuse Univ 3rd party 2276
Temple Univ 3rd party 233
Touzot contract 2997
Touzot contract 2180
Trinity College Dublin 3rd party 9157
UBS Publishers' Distributors, New Delhi 3rd party 183
UC Berkeley Z39.50 or 3rd party 832
UC Berkeley Z39.50 or 3rd party 624
UC Berkeley Z39.50 or 3rd party 566
UC Santa Barbara Z39.50 or 3rd party 409
UCL 3rd party 725
Univ of Birmingham 3rd party 611
Univ of Bristol 3rd party 251
Univ of Chicago 3rd party 7540
Univ of Edinburgh 3rd party 956
Univ of Florida 3rd party 771
Univ of Glasgow 3rd party 1509
Univ of Iowa 3rd party 506
Univ of Leeds 3rd party 857
Univ of Liverpool 3rd party 197
Univ of London 3rd party 1648
Univ of Michigan 3rd party 2841
Univ of Minnesota 3rd party 106
Univ of Minnesota 3rd party 6997
Univ of Newcastle upon Tyne 3rd party 172
Univ of Nottingham 3rd party 667
Univ of Pennsylvania 3rd party 1964
Univ of Rochester 3rd party 574
Univ of Sheffield 3rd party 534
Univ of Southampton 3rd party 305
Univ of Southern California Z39.50 or 3rd party 328
Univ of Tennessee 3rd party 99
Univ of Warwick 3rd party 553
UTLAS 3rd party 2715
Vientos Tropicales contract 157
W.H. Everett 3rd party 271
Washington Library Network (merged into OCLC) contract 1372
Wellcome 3rd party 178
Yale Univ, with various suffixes for individual parts (e.g. Law) 3rd party 11079

Hugh Taylor - 28th July 2011