For some time now the OKFN Working Group on Open Bibliographic Data has been working on Principles on Open Bibliographic Data. While first attempts were mainly directed towards libraries and other public institutions we decided to broaden the principle’s scope by amalgamating it with Peter Murray-Rust’s draft publisher guidelines. The results can be seen below. We ask anyone to review these principles, discuss the text and suggest improvements.
Principles on Open Bibliographic Data
Producers of bibliographic data such as libraries, publishers, or social reference management communities have an important role in supporting the advance of humanity’s knowledge. For society to reap the full benefits from bibliographic endeavours, it is imperative that bibliographic data be made openly available for free use and re-use by anyone for any purpose.
Bibliographic Data
In its narrowest sense the term ‘bibliographic data’ refers to data describing bibliographic resources (articles, monographs, electronic texts etc.) to fulfill two goals:
- Identifying the described resource, i.e. pointing to a unique resource in the set of all bibliographic resources.
- Addressing the described resource, i.e. indicating how/where to find the described resource.
Traditionally one description served both purposes at once by delivering information about:
- author(s) (possibly including addresses and other contact details) and editor(s),
- title,
- publisher,
- publication year, month and place,
- title and identification of enclosing work (e.g. a journal),
- page information,
- format of work.
In the web environment the address can be a URL and the identification a URI (URN, DOI etc.). Identifiers thus fall under this narrow concept of ‘bibliographic data’.
Furthermore there is several other information about a bibliographic resource which in this document falls under the concept of bibliograhic data. This data might be produced by libraries as well as publishers or online communities of book lovers and social reference management systems:
- Identifiers (ISBN, LCCN, OCLC number etc.)
- rights associated with work
- sponsorship (e.g. funding)
- tags,
- exemplar data (number of holdings, call number)
- metametadata (administrative metadata (last modified etc.) probably often created automatically).
- relevant links to wikipedia, google books, amazon etc.
- cover images (self-scanned or from amazon)
- table of content
- links to digitizations of tables of content, registers, bibliographies etc.
Libraries as well produce authority files like:
- name authority files,
- subject authority files,
- classifications.
We assert that the information associated with an indivdual work is in the public domain. It follows that an indivdiual bibliographic entry derived from the work itself is free of restrictive rights as are authority records. This holds true as well for individual authority records. There might only be rights on aggregations of bibliographic and authority data.
Formally, we recommend adopting and acting on the following principles:
- Where bibliographic data or collections of bibliographic data are published it is critical that they be published with a clear and explicit statement of the wishes and expectations of the publishers with respect to re-use and re-purposing of individual bibliographic entries/elements, the whole data collection, and subsets of the collection. This statement should be precise, irrevocable, and based on an appropriate and recognized legal statement in the form of a waiver or license.
When publishing data make an explicit and robust license statement. -
Many widely recognized licenses are not intended for, and are not appropriate for, metadata or collections of metadata. A variety of waivers and licenses that are designed for and appropriate for the treatment of are described here. Creative Commons licenses (apart from CC0), GFDL, GPL, BSD, etc. are NOT appropriate for data and their use is STRONGLY discouraged.
Use a recognized waiver or license that is appropriate for metadata. -
The use of licenses which limit commercial re-use or limit the production of derivative works by excluding use for particular purposes or by specific persons or organizations is STRONGLY discouraged. These licenses make it impossible to effectively integrate and re-purpose datasets and prevent commercial activities that could be used to support data preservation.
If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used. -
Furthermore, it is STRONGLY recommended that bibliographic data, especially where publicly funded, be explicitly placed in the public domain via the use of the Public Domain Dedication and Licence or Creative Commons Zero Waiver. This is in keeping with the public funding of most library institutions and the general ethos of sharing and re-use within the library community.
We strongly recommend explicitly placing bibliographic data in the Public Domain via PDDL or CC0. -
While we appreciate that certain types of bibliographic metadata do require some extra work in their creation we strongly assert that making these open has major benefits not only to the community as a whole but also to the creator (author, publisher, library, etc.). Benefits include enhanced discoverability widening the potential usage of a work and “save-the-time-of-the-reader”. These types include:
- abstracts (whether generated by author, publisher, library or machine)
- keywords, subject headings and classification notations (whether generated by author, publisher, library or machine)
- reviews (either human or machine-generated)
As a fifth principle we string urge that creators of bibliographic metadata explicitly either dedicate this to the public domain or use an open licence.
Pingback: Tweets that mention Principles for Open Bibliographic Data | Open Biblio (graphic) Projects -- Topsy.com
It truly would be helpful if, in addition to all the bibliographic data, you could include the biographic data for the author–at least the Date of Birth and Date of Death (if any). This would permit an evaluation of whether the book is in the Public Domain in Canada.
Similar data on any other significant contributors (editor, illustrator or graphic artist, translator if applicable…) would also be helpful.
Simon Tarlink
Site Administrator
Distributed Proofreaders Canada
(part of the world-wide Project Gutenberg volunteer network)
[Full disclosure: Distributed Proofreaders Canada converts scans of Public Domain books into complete and accurate e-books (not merely scansets)]
Pingback: Links 15/10/2010: Wine 1.3.5 Out, Ubuntu 11.04 is Developed | Techrights
Pingback: Unilever Centre for Molecular Informatics, Cambridge - Principles for Open Bibliographic Data – please comment « petermr’s blog
So, for clarification of the second principle, is any licence appropriate for data, also appropriate for metadata? The phrasing of that section doesn’t seem 100% clear to me and I’d be nervous about referring others to it in that form.
Pingback: Book liberation with open data (or, how The Online Books Page just got much bigger) « Everybody's Libraries
Thanks for the feedback.
@Simon Biographicdata is already implicitely covered by including the name authority files of libraries which contain these information. I don’t know whether publishers usually have such biographic information, e.g. in article metadata. If they do we should explicitely include biographic data if they don’t I think it is enough to include authority files like we already have.
@Laura Yes, any license appropriate for data is also appropriate for metadata. This should be made more clear in the final version.
Good principles, i think.
Here in Finland we cannot at the moment publish authority files/databases of the authors, because privacy laws protect the authors against that. But there’s a whole lot of other stuff we could publish (working on it as we speak, but things happen slowly)!
Pingback: Thursday Threads: Refining Data, Ebook Costs, Open Bibliographic Data, Copyright Infringement | Disruptive Library Technology Jester
Pingback: Author Identity and Open Bibliography « IMS Bulletin