For some time now the OKFN Working Group on Open Bibliographic Data has been working on Principles on Open Bibliographic Data. While first attempts were mainly directed towards libraries and other public institutions we decided to broaden the principle’s scope by amalgamating it with Peter Murray-Rust’s draft publisher guidelines. The results can be seen below. We ask anyone to review these principles, discuss the text and suggest improvements.
Principles on Open Bibliographic Data
Producers of bibliographic data such as libraries, publishers, or social reference management communities have an important role in supporting the advance of humanity’s knowledge. For society to reap the full benefits from bibliographic endeavours, it is imperative that bibliographic data be made openly available for free use and re-use by anyone for any purpose.
In its narrowest sense the term ‘bibliographic data’ refers to data describing bibliographic resources (articles, monographs, electronic texts etc.) to fulfill two goals:
- Identifying the described resource, i.e. pointing to a unique resource in the set of all bibliographic resources.
- Addressing the described resource, i.e. indicating how/where to find the described resource.
Traditionally one description served both purposes at once by delivering information about:
- author(s) (possibly including addresses and other contact details) and editor(s),
- publication year, month and place,
- title and identification of enclosing work (e.g. a journal),
- page information,
- format of work.
In the web environment the address can be a URL and the identification a URI (URN, DOI etc.). Identifiers thus fall under this narrow concept of ‘bibliographic data’.
Furthermore there is several other information about a bibliographic resource which in this document falls under the concept of bibliograhic data. This data might be produced by libraries as well as publishers or online communities of book lovers and social reference management systems:
- Identifiers (ISBN, LCCN, OCLC number etc.)
- rights associated with work
- sponsorship (e.g. funding)
- exemplar data (number of holdings, call number)
- metametadata (administrative metadata (last modified etc.) probably often created automatically).
- relevant links to wikipedia, google books, amazon etc.
- cover images (self-scanned or from amazon)
- table of content
- links to digitizations of tables of content, registers, bibliographies etc.
Libraries as well produce authority files like:
- name authority files,
- subject authority files,
We assert that the information associated with an indivdual work is in the public domain. It follows that an indivdiual bibliographic entry derived from the work itself is free of restrictive rights as are authority records. This holds true as well for individual authority records. There might only be rights on aggregations of bibliographic and authority data.
Formally, we recommend adopting and acting on the following principles:
- Where bibliographic data or collections of bibliographic data are published it is critical that they be published with a clear and explicit statement of the wishes and expectations of the publishers with respect to re-use and re-purposing of individual bibliographic entries/elements, the whole data collection, and subsets of the collection. This statement should be precise, irrevocable, and based on an appropriate and recognized legal statement in the form of a waiver or license.
When publishing data make an explicit and robust license statement.
Many widely recognized licenses are not intended for, and are not appropriate for, metadata or collections of metadata. A variety of waivers and licenses that are designed for and appropriate for the treatment of are described here. Creative Commons licenses (apart from CC0), GFDL, GPL, BSD, etc. are NOT appropriate for data and their use is STRONGLY discouraged.
Use a recognized waiver or license that is appropriate for metadata.
The use of licenses which limit commercial re-use or limit the production of derivative works by excluding use for particular purposes or by specific persons or organizations is STRONGLY discouraged. These licenses make it impossible to effectively integrate and re-purpose datasets and prevent commercial activities that could be used to support data preservation.
If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used.
Furthermore, it is STRONGLY recommended that bibliographic data, especially where publicly funded, be explicitly placed in the public domain via the use of the Public Domain Dedication and Licence or Creative Commons Zero Waiver. This is in keeping with the public funding of most library institutions and the general ethos of sharing and re-use within the library community.
We strongly recommend explicitly placing bibliographic data in the Public Domain via PDDL or CC0.
While we appreciate that certain types of bibliographic metadata do require some extra work in their creation we strongly assert that making these open has major benefits not only to the community as a whole but also to the creator (author, publisher, library, etc.). Benefits include enhanced discoverability widening the potential usage of a work and “save-the-time-of-the-reader”. These types include:
- abstracts (whether generated by author, publisher, library or machine)
- keywords, subject headings and classification notations (whether generated by author, publisher, library or machine)
- reviews (either human or machine-generated)
As a fifth principle we string urge that creators of bibliographic metadata explicitly either dedicate this to the public domain or use an open licence.