Minutes: 16th Virtual Meeting of the OKFN Openbiblio Group

Date December, 6th 2011, 16:00 GMT

Channel Meeting was held via Skype and Etherpad

Participants

  • Adrian Pohl
  • Jim Pitman
  • Karen Coyle

Agenda

Report from Germany

#swib11

Adrian reported from the third conference Semantic Web in Libraries (SWIB) which this year took place in Hamburg. See the program (Day 1, Day 2) with abstracts of the talks and presentation slides (video recordings to come).

Talking about the project culturegraph we got into a discussion about matching/deduplication algorithms. It was generally agreed, that the problem in this area is, that much duplication of effort takes place because of best practices and algorithms not being shared widely – neither by public institutions nor – of course – by vendors.

It was discussed which would be the best forum to document best practices and/or collect a list of links to relevant projects: the code4lib wiki? An OKF site? The open wiki of the W3C’s Linked Library Data group?

ACTION: As soon as there is serious work on deduplication at culturegraph.org, Adrian will start a conversation about sharing algorithms and best practices and initiate a forum for doing it.

BVB/KOBV data release

Adrian reported about the release of 23 million records from German library networks BVB and KOBV. (See this post for information on this).
Karen showed interest in a documentation of the MARC-to-RDF conversions.http://epsiplatform.eu/content/bavaria-opens-data-portal. But so far no documentation seems to exist. Jim would like to know how he can get all the data that is relevant to him (concerning mathematics/statistics) out of the MARC/XML or the Linked Open Data. That is a question he generally asks himself when he hears about open bibliographic data releases.

Report on BKN/BibSoup/BibServer/BibJSON progress (Jim)

  • BibServer now imports and exports according to documented BibJSON standard.
  • Having read Rufus’ answers regarding the questions about thedatahub.org as a data repository (see last meeting’s minutes), Jim is fine with using the Data Hub to upload his open bibliographic datasets.

  • ACTION: Exemplar records and working instances for further discussion of entities in BibJSON.

LoC Bibliographic Framework

In the end, there was a short discussion about the Library of Congress’ initiative for a new bibliographic framework based on Linked Data standards. The US-centric as well as library-centric approach was seen as a problem and the need to broaden the approach and seperate activities between agents (LoC for cataloging, W3C for technologies, NISO etc.) was articulated.

Posted in minutes, OKFN Openbiblio | Leave a comment

German Library Networks BVB and KOBV release 23 Million Records

The two German library networks BVB (BibliotheksVerbund Bayern = Library Network Bavaria) and KOBV (Kooperativer Bibliotheksverbund Berlin-Brandenburg = Cooperative Library Network Berlin-Brandenburg) have recently released 23 million records from their joint union catalogue under a CC0 license. The opening of Bavaria’s open data portal was taken as occasion for publishing this data.

MARC records and Linked Open Data

The data was released as MARC records (MARC XML) and also an experimental Linked Open Data service was launched at [http://lod.b3kat.de/] with full dumps of the RDF data made available for download:

The data was exported on November, 22th 2011 and updates can be harvested via an OAI-PMH interface from http://bvbr.bib-bvb.de:8991/aleph-cgi/oai/oai_opendata.pl?verb=ListRecords&set=OpenData&metadataPrefix=marc21.

So far, no entries for this data exist on the Data Hub. Anyone?

Is openness becoming the new normal in Germany?

With this initiative, there is now open data from four German library networks available, BVB and KOBV now being the first to having released their entire union catalogue. The North Rhine-Westphalian Library Service Center (hbz) started in March 2010 with opening up data (see here) and in autumn 2010 libraries from the southwestern library network (SWB) followed (see here).

Hopefully this development will continue and libraries and related institutions will continue releasing open data, be it bibliographic data or other data produced by libraries.

Posted in Data, LOD-LAM | Leave a comment

Animal Garden – open science issues

Peter and Tom Murray-Rust put together a presentation called Animal Garden, which we have now converted to a prezi for nice swooshy embedding-ness in web pages.

This prezi tells the story of some teddybear scientists who try to share their lovely flowers with the world, only to find that their flowers get locked up behind a big wall… but there is hope! Can a certain open access turtle save them?

Posted in JISC OpenBib, licensing, OKFN Openbiblio | Tagged , | Leave a comment

Recommendations on Releasing Library Data as Open Data

Last week, the German KIM-DINI working group (KIM = Competence Centre Interoperable Metadata) officially published recommentations for the release of open data by libraries and related institutions. The recommendations are intended to serve information facilities as guide and reference text for the release of open data.

Besides descriptive metadata which is already covered in other documents also non-sensitive data produced by libraries and related institutions is subject of the recommendations, e.g. statistical data or circulation data. Furthermore, the recommendations don’t only cover open licensing but also open access, open standards and the documentation, sustainability as well as other apsects of open data.

The recommendations include nine principles for open library data. To be called ‘open’ at all the three core demands of open access, open standards and open licenses have to be met.

Furthermore open data should be updated regularly and also be published as raw data. It should be described in a structured form and be accessible without registration. Precautions for a sustainable provision of open data should be taken.

The recommendations follow existing principles and guidelines for open data in memory institutions or the public sector in general. The German original can be found here (shortlink: http://is.gd/openbibdata).

The text itself is published under a CC0 license. Its dissemination, re-publication and reuse is expressly desired.

I had a first try at an English translation of the recommendations which is posted below. Anybody please correct mistakes and bad English on the etherpad at http://okfnpad.org/dini-kim-recommendations. Everybody is free to

Recommendations on Opening up Library Data

v.1.0 published on October, 31th 2011

1. Preamble

Libraries and other information facilities work on a daily basis and in various ways with data for different purposes and destinations. They act as producers, providers, users and aggregators of data. To reap the full benefits from data produced by public institutions, it is necessary to publish them openly on the internet.

2. Subject

In information facilities various forms of data are produced which could be subject of an open data release. It is important to emphasize that an open data release can only be carried out under the condition that

  1. the respective data isn’t personal data or otherwise sensitive data,
  2. the respective institution is holder of the database rights or, if possible, the copyright over the data.

Library data includes both bibliographic data in accordance with the “Principles on Open Bibliographic Data” and other data that is created by libraries and related institutions.

For the administration of libraries’ services also further data accrues that – insofar it isn’t personal or otherwise sensitive data – can also be released as open data. These data includes item data, acquisition data, anonymized circulation data, statistical data.

3. Principles

The DINI-KIM working group recommends the release of library data as open data to library institutions in the German-speaking world and beyond. In doing so, the following principles have to be strictly adhered:

  • Open Access, that is the data as a whole must be available on the web openly and without cost.
  • Open Standards, that is the data must be available in an openly documented and non-proprietary format.
  • Open licenses, that is the data (as individual date and as collection) must be published under an open license according to the open definition. to guarantee the data’s best possible legal interoperability, the DINI-KIM working group recommends the use of a public domain waiver like the CC0 Public Domain Dedication or the Public Domain Dedication and License (PDDL).

Furthermore, we recommend considering the following principles:

  • Documentation: A structured description of the data should be published. At best, the data should be registered at a central registry (like the Data hub).
  • Raw data: As possible, the data should be made available in the form as it accrues in libraries’ information cycle. All further filtering or processing is shifted to those who make use of the data.
  • Timeliness: The data should be published in a reasonable time after its creation. What is reasonable may vary regarding the kind of data.
  • Structured: The data should be published in a structured format which allows easy processing.
  • Non-discriminating: Accessing the data should be possible to all, the only acceptable hurdle being acces to the internet. That is, no registration should be required.
  • Sustainability: Provision of open data should be connected with the development of a sustainability concept which ensures long-term archiving and access to older versions of the data.

Rarely compliance to all principles will be guaranteed from the beginning. However, the first three principles are necessary conditions to speak about “open library data” in the first place. It is strongly recommended to start with publishing raw data that may not be available in an openly documented format and/or that may not be structured or regularly updated. In the medium term, it should be worked on complying to all principles.

4. Related Material

Created within the group “Lizenzen” of the DINI-KIM working group.

Contributors: Patrick Danowski, Kai Eckert, Christian Hauschke, Adrian Pohl and others

The text of these recommendations is published under the Creative Commons Lizenz CC0 (this also holds for the English translation). Thus, it is in the public domain, that is it belongs to all and may be used for any purpose without constraints. When reuising the text, it is asked for naming the source.

Posted in licensing, OKFN Openbiblio | Leave a comment

Minutes: 15th Virtual Meeting of the OKFN Openbiblio Group

Date November, 8th 2011, 16:00 GMT

Channel Meeting was held via Skype and Etherpad

Participants

  • Adrian Pohl
  • Sam Leon (OKF Community Coordinator)
  • Jim Pitman
  • Thad Guidry (Freebase – Google Refine)

Agenda

Sam’s Introduction

  • Sam recently joined the OKF as community coordinator and introduced himself
  • he’s responsible for supporting the work in the area of open bibliographic data and open heritage.
  • He might help with editing material and post things to the openbiblio.net blog if people don’t have the time
  • Or he might otherwise support internal and external communication.

CKAN-as-a-data-repository questions

Some questions were brought up by Jim about whether CKAN now acts as a data repository.:

  1. How big a dataset can be uploaded to CKAN?
  2. When and what data should be uploaded to CKAN? When should a dataset that is stored elsewhere just be registered? Are there some recommendations/reflections about this anywhere?
  3. Is there a post about the CKAN change to a repository?

Adrian posted these questions to the CKAN-discuss list, here is Rufus Pollock’s response:

  1. “At the present time I think the size limit for upload of a single file is 500Mb but we could raise this pretty easily.”
  2. “We don’t have particular recommendations but I note that we have been
    hard at work on what we call “archiving” capability. If enabled this
    will automatically backup/cache a copy of a remote file to our local
    storage so that if the remote file disappears we can still provide it
    (this obviously needs to be somewhat configurable as there are some
    massive files around and we don’t want to cache e.g. simple html pages
    …)”
  3. “There’s a post about the release of this extension for CKAN. We actually deployed this at the same time on thedatahub for a test period but didn’t officially announce. It’s been in use since and completely stable (backend storage is actually google storage).”

Activities in Germany (Adrian)

Adrian reports from openbiblio activities in Germany:

Library of Congress announces new bibliographic framework

We briefly talked about the Library of Congress’ announcement regarding the ‘new bibliographic framework’ being based on Linked Data principles: http://www.loc.gov/marc/transition/news/framework-103111.html

ACTION: Adrian post about LoC (include a photo)

Posted in minutes, OKFN Openbiblio | Leave a comment

Interview on Open Bibliographic Data in German Open Data Blog

Yesterday, an interview was published in the Open Data blog of the German weekly newspaper “Die Zeit”. In this interview, I answer questions asked by Lorenz Matzat about open bibliograpic data, the OKFN’s working group and openbiblio activities going on in Germany.

The interview was a nice opportunity to get some information about open bibliographic data activities to people who haven’t heard about it yet and to call attention to the work that was and is done by the OKFN Working Group on Open Bibliographic Data and in Germany by the hbz (where I work) and the KIM DINI working group that has just published some recommendations for opening up library data (a blog post will follow).

Posted in OKFN Openbiblio, Talks | Leave a comment

German Guide for Open Library Catalogue Data

The German legal scholar and lawyer Dr. Till Kreutzer has written a legal guide titled “Open Data – Releasing data from library catalogues” at the request of the North Rhine-Westphalian Library Service Center (hbz).

For some time now there has been a strong effort in opening up data from library catalogues, see for instance these lists of library data sources. For many libraries, diverse and sometimes complex legal questions are an obstacle to also publish open data. The legal guide shall give some orientation. It is intended for employees of public and academic libraries and especially for people without a legal background.

In Part 1 the guide deals with legal questions that occur in the creation of catalogues: It is explained whether individual parts of a catalogue can be copyrighted and, if yes, under which conditions. Then it is examined under which conditions data providers have a sui generis database right on complete databases.

Part 2 of the guide addresses the issue under which conditions a library or related institution can publish a database as open data. Finally, licenses to use for opening up catalogue data are recommended.

The guide is available under a Creative Commons Attribution license, and the author and publisher encourage wider distribution of the text.

This text is a freely translated and slightly changed version of an announcement by the North Rhine-Westphalian Library Service Center (hbz). Disclaimer: The hbz is Adrian Pohl’s employer.

Posted in licensing | Leave a comment

Finnish Turku City Library and the Vaski consortia now Open Data with 1.8M MARC-records

Let's open up our metadata containers

I’m happy to announce that our Vaski-consortia of public libraries  serving total 300 000 citizens in Turku and the a dozen surrounding municipalities in western Finland, have recently published all of our 1.8 million bibliographical records in the open, as a big pile of data (see on The Data Hub).

Each of the records describes a book, recording, movie, song or other publication in our library catalogue. Titles, authors, publishing details, library classifications, subject headings, identifiers and so on systematically saved in MARC -format, the international, structured library metadata standard since the late 1960s.

Unless I’ve missed something, ours is the third large scale Open Data -publication from the libraries of Finland. The first one was the 670 000 bibliographical records of HelMet-consortia (see on The Data Hub), an another consortia of public libraries around the capital Helsinki. This first publication was organized and initiated in 2010 by Kirjastot.fi Labs, a project seeking for more agile, innovative library concepts. The second important Open Data publication was our national generic theseurus Yleinen suomalainen asiasanasto YSA which is also available as a cool semantic ontology.

Joining this group of Open Data publications was natural for our Vaski-consortia, because we are moving our data from one place to another anyway; we are in the middle of the process of converting from our national FinMARC -flavour to the international MARC21 -flavour of MARC, swapping our library system from Axiell PallasPro to Axiell Aurora, plus implementing a new, ambitious search and discovery interface for all the Finnish libraries, archives and museums (yes, it’s busy times here and we love the taste of a little danger). All this means we are extracting, injecting, converting, mangling, breaking, fixing, disassembling and reassembling all of our data. So, we asked ourselves, why not publish all of our bibliographical data on the net while we are on it?

The process of going Open Data has been quite seamless for us. On my initiative the core concept of Open Data was explained to the consortia’s board. As there were no objections or further questions, we contacted our vendor BTJ who immidiately were supporting the idea. From there on it was basically just about some formalities with BTJ, consulting international colleagues regarding licensing, writing a little press-release, organizing a few hundred megabytes of storage space on the internet. And trying to make sure the Open Data -move didn’t get buried under other, more practical things during the summertime.

For our data license we have chosen the liberal Creative Commons-0 license (CC0), because we try to have as little obstructions to our data as possible. However we have agreed on a 6 month embarko with BTJ, a company who is doing most of the cataloguing for the Finnish public libraries. We believe that it is a good compromise to prefer publishing data that is slightly outdated, than try to make the realm of immaterial property rights any more unclear than it already is.

Traditional library metadata at Turku main library

We seriously cannot anticipate what our Open Data -publication will lead to. Perhaps it will lead to absolutely nothing at all. I believe most organizations opening up their data face this uncertainty. However what we do know for sure is, that all of the catalogue records we have carefully crafted, acquired and collected, are seriously underutilized if they are only used for one particular purpose: finding and locating items in the library collections.

For such a valuable assett as our bibliographical metadata, I feel this is not enough. By removing obstacles for accessing our raw data, we open up new possibilities for ourselves, for our colleagues (understood widely), and to anybody interested.

Mace Ojala, project designer
Turku City Library/Vaski-consortia; National Digital Library of Finland, Cycling for libraries, etc.
http://xmacex.wordpress.com, @xmacex, Facebook etc.

Posted in Data, guest post, licensing, OKFN Openbiblio | Tagged , , , , | 1 Comment

Did you hear that loud bang? That was CENL releasing their data under CC0

The conference of European National Librarians (CENL) came up with great news last wednesday! Data from all European national libraries will be published under an open license! From the announcement:

Meeting at the Royal Library of Denmark, the Conference of European National Librarians (CENL), has voted overwhelmingly to support the open licensing of their data. CENL represents Europe’s 46 national libraries, and are responsible for the massive collection of publications that represent the accumulated knowledge of Europe.

What does that mean in practice?
It means that the datasets describing all the millions of books and texts ever published in Europe – the title, author, date, imprint, place of publication and so on, which exists in the vast library catalogues of Europe – will become increasingly accessible for anybody to re-use for whatever purpose they want.

The first outcome of the open licence agreement is that the metadata provided by national libraries to Europeana.eu, Europe’s digital library, museum and archive, via the CENL service The European Library, will have a Creative Commons Universal Public Domain Dedication, or CC0 licence. This metadata relates to millions of digitised texts and images coming into Europeana from initiatives that include Google’s mass digitisations of books in the national libraries of the Netherlands and Austria.

See also this post by Richard Wallis.

(Thanks to Mathias for the title of this post.)

Posted in Data, licensing, LOD-LAM | Leave a comment

Open bibliographic data checklist

This guest post by Jindřich Mynarz was originally published here under a Creative Commons Attribution license.

I have decided to write a few points that might be of interest to those thinking about publishing open bibliographic data. The following is a fragment of an open bibliographic data checklist, or, how to release your library’s data into the public without a lawyer holding your hand.

I have been interested in open bibliographic data for a couple of years now, and I try to promote them at the National Technical Library, where we have, so far, released only authority dataset — the Polythematic Structured Subject Heading System. The following points are based on my experiences with this topic. What should you pay attention to when opening your bibliographic data then?

  • Make sure you are the sole owner of the data or make arrangements with other owners. For instance, things may get complicated in the case data was created collaboratively via shared cataloguing. If you are not in complete control of the data, then start with consulting the other proprietors that have a stake in the datasets.
  • Check if the data you are about to release are not bound by some contractual obligations. For example, you may publish a dataset under a Creative Commons licence, soon to realize that there are some unsolved contracts with parties that helped fund the creation of that data years ago. Then you need to discuss this issue with the involved parties to resolve if making the data open is a problem.
  • Read your country’s legislation to get to know what you are able to do with your data. For instance, in Czech Republic it is not possible to put data into the public domain intentionally. The only way how public domain content is created is by the natural order of things, i.e., author dies, leaves no heir, and after quite some time the work enters the public domain.
  • See if the data are copyrightable. For instance, if the data do not fall into the scope of the copyright law of your country, it is not suitable to be licenced under Creative Commons since this set of licences draws its legal binding from the copyright law; it is an extension of the copyright and it builds on it. Facts are not copyrightable, and most bibliographic records are made of facts. However, some contain a creative content, for example, subject indexing or an abstract, and as such are appropriate for licencing based on the copyright law. Your mileage may vary.
  • Consult the database act. Check if your country has a specific law dealing with the use of databases that might add more requirements that need your attention. For example, in some legal regimes databases are protected on other level, as an aggregation of individual data elements.
  • Different licencing options may be applicable for content and structure of dataset, for instance when there are additional terms required by database law. You can opt in dual-licensing and use two different licences, one for dataset’s content that is protected by copyright law (e.g., a Creative Commons licence), and one for dataset’s structure for which the copyright protectio)).
  • Choose a proper licence. A proper open may not apply (e.g., Public Domain Dedication and License).
    Choose a proper licence. A proper open licence is a licence that conforms with the Open Definition (and will not get you sued), so pick one of the OKD-Compliant licenses. A good source of solid information about licences for open data is Open Data Commons.
  • BONUS: Tell your friends. Create a record in the Data Hub (formerly CKAN) and add it to the bibliographic data group to let others know that your dataset exists.

Even if it may seem there are lots of things you need to check before releasing open bibliographic data, it is actually easy. It is an performative speech act: you only need to declare your data open to make it open.

Disclaimer: If you are unsure about some of the steps above, see a lawyer to consult it. Note that the usual disclaimers apply for this post, i.e., IANAL.

Posted in guest post, licensing | Tagged | Leave a comment