Open Bibliography and Open Bibliographic Data » guest post

Europeana and Linked Open Data

Antoine Isaac — Mon, 19 Mar 2012 08:53:44 +0000

Europeana has recently released a new version of its Linked Data Pilot, data.europeana.eu. We now publish data for 2.4 million objects under an open metadata licence: CC0, the Creative Commons Public Domain Dedication. This post elaborates on this earlier one by Naomi.

The interest of Europeana for Linked Open Data

Europeana aims to provide the widest access possible to the European cultural heritage massively published through digital resources by hundreds of musea, libraries and archives. This includes empowering other actors to build services that contribute to such access. Making data openly available to the public and private sectors alike is thus central to Europeana’s business strategy. We are also trying to provide a better service by making available richer data than the one very often published by cultural institutions. Data where millions of texts, images, videos and sounds are linked to other relevant resources: persons, places, concepts…

Europeana has therefore been interested for a while in Linked Data, as a technology that facilitates these objectives. We entirely subscribe to the views expressed in the W3C Library Linked Data report, which shows the benefits (but also acknowledges the challenges) of Linked Data for the cultural sector.

Europeana’s first toe in the Linked Data water

Last year, we released a first Linked Data pilot at data.europeana.eu. This has been a very exciting moment, a first opportunity for us to play with Linked Data.

We could deploy our prototype relatively easily and the whole experience was extremely valuable, from a technical perspective. In particular, this has been the first large-scale implementation of Europeana’s new approach to metadata, the Europeana Data Model (EDM). This model enables the representation of much richer data compared to the current format used by Europeana in its production service. First, our pilot could use EDM’s ability to represent several perspectives over a cultural object. We have used it to distinguish the original metadata our providers send us, from the data that we add ourselves. Among the Europeana data there are indeed enrichments that are created automatically and are not checked by professional data curators. For trust purposes, it is important that data consumers can see the difference.

We could also better highlight a part of Europeana’s added value as a central point for accessing digitized cultural material, in direct connection with the above mentioned enrichment. Europeana indeed employs semantic extraction tools that connect its objects with large multilingual reference resources available as Linked Data, in particular Geonames and GEMET. This new metadata allows us to deliver a better search service, especially in a European context. With the Linked Data pilot we could explicitly point at them, in the same environment they are published in. We hope this will help the entire community to better recognize the importance of these sources, and continue to provide authority resources in interoperable Linked Data format, using for example the SKOS vocabulary.

If you are interested in more lessons learnt from a technical perspective, we have published more of them in a technical paper at the Dublin Core conference last year. Among the less positive aspects, data.europeana.eu is still not part of the production system behind the main europeana.eu portal. It does not come with the guarantee of service we would like to offer for the linked data server, though the provision of data dumps is not impacted by this.

Making progress on Open Data

Another downside is that data.europeana.eu publishes data only for a subset of the objects the our main portal provides access to. We started with 3.5 million objects over a total of 20 millions. These were selected after a call for volunteers, to which only few providers answered. Additionally, we could not release our metadata under fully Open terms. This was clearly an obstacle to the re-use of our data.

After several months we have thus released a second version of data.europeana.eu. Though still a pilot, it nows contain fully open metadata (CC0).

The new version concerns an even smaller subset of our collections: in February 2012, data.europeana.eu contains metadata on 2.4 million objects. But this must be considered in context. The qualitative step of fully open publication is crucial to us. And over the past year, we have started an active campaign to convince our community of opening up their metadata, allowing everyone to make it work harder for the benefits of end users. The current metadata served at data.europeana come from data providers who have reacted early and positively to our efforts. We trust we will be able to make metadata available for many more objects in the coming year.

In fact we hope that this Linked Open Data pilot can contribute a part of our Open Data advocacy message. We believe such technology can trigger third parties to develop innovative applications and services, stimulating end users’ interest for digitized heritage. This would of course help to convince more partners to contribute metadata openly in the future. We have released next to our new pilot an animation that conveys exactly this message, you can view it here.

For additional information about access to and technical details of the dataset, see data.europeana.eu and our entry on the Data Hub.

Installing BibServer from the repo on Mac OSX

Naomi Lillie — Tue, 13 Mar 2012 16:53:11 +0000

The following guest post is by Edmund Chamberlain who works at Cambridge Unviversity Library.

As part of my work on the Open Bibliography project, I wanted to test how easy it would be for an average Systems Librarian such as myself to get BibServer up and running.

Turns out, it was pretty simple, for a development environment at least. The latest install docs can be found at readthedocs.org and contain pointers to all the required packages and dependencies; see here for install instructions.

###Python and dependencies

I started almost from scratch with a new Macbook Air running OSX Lion. The first thing I needed was the latest binaries for Python, the language BibServer and most OKFN projects are coded in. Python is installed on OSX by default but for good measure, I installed XCode 4 for free from the Apple App store. Advice on getting Python onto your favoured *nix OS or even Windows can be found on the main Python site.

According to the BibServer docs, a few additional dependencies were required, specifically PIP (one of several Python package manager options) and Virtual Env (a means to create multiple separate Python environments). Some great instructions on doing this can be found here.

Finally, I needed GIT version control software. Instructions for getting GIT onto OSX can be found along with a dedicated installer. If you are not familiar with GIT, here is a great introduction.

###ElasticSearch

Next up, I needed to install the indexing service underpinning BibServer, ElasticSearch. Having spent days grappling with various indexing solutions and document / graph based databases in the past, his was the part I was most hesitant about. Turns out, it really was as simple as the instructions stated.

1) Download the latest version into an appropriate place, extract files and simply start it.

2) Start ElasticSearch with:

$sudo bin/elasticsearch

3) Elastic Search is built with and runs on top of of Java. If you don’t have this installed OSX Lion will prompt you to download and install the latest version.

4) The install instructions give some tips on setting it up as a service.

###BibServer

With GIT and VirtualEnv installed, BibServer can be pulled and set up relatively quickly.

1) Create and start a virtual environment where {myenv} is the filepath of the environment:

virtualenv {myenv}

. {myenv}/bin/activate

2) Using GIT, clone the BibServer source code into that environment:

mkdir {myenv}/src

sudo git clone https://github.com/okfn/bibserver {myenv}/src/

3) Run a development install using Pip:

cd {myenv}/src/bibserver/bibserver

sudo pip install -e .

###Running it!

1) Ensure ElasticSearch is running.

2) Start Bibserver up:

sudo python {myenv}/src/bibserver/bibserver/web.py

3) Point your favoured web browser at:

localhost::5000

4) Upload a sample CSV file.

BibServer can be easily run a a background process using screen or some other suitable tool.

Linked Data at the Biblioteca Nacional de España

Adrian Pohl — Thu, 02 Feb 2012 08:29:23 +0000

The following guest post is from the National Library of Spain and the Ontology Engineering Group (Technical University of Madrid (UPM)).

Datos.bne.es is an initiative of the Biblioteca Nacional de España (BNE) whose aim is to enrich the Semantic Web with library data.

This initiative is part of the project “Linked Data at the BNE”, supported by the BNE in cooperation with the Ontology Engineering Group (OEG) at the Universidad Politécnica de Madrid (UPM). The first meeting took place in September 2010, whereas the collaboration agreement was signed in October 2010. The first set of data was transformed and linked in April 2011, but a more significant set of data was done in December 2011.

The initiative was presented in the auditorium of the BNE on 14th December 2011 by Asunción Gómez-Pérez, Professor at the UPM and Daniel Vila-Suero, Project Lead (OEG-UPM), and by Ricardo Santos, Chief of Authorities, and Ana Manchado Mangas, Chief of Bibliographic Projects, both from the BNE. The attendant audience enjoyed the invaluable participation of Gordon Dunsire, Chair of the IFLA Namespace Group.

The concept of Linked Data was first introduced by Tim Berners-Lee in the context of the Semantic Web. It refers to the method of publishing and linking structured data on the Web. Hence, the project “Linked Data at the BNE” involves the transformation of BNE bibliographic and authority catalogues into RDF as well as their publication and linkage by means of IFLA-backed ontologies and vocabularies, with the aim of making data available in the so-called cloud of “Linked Open Data”. This project focuses on connecting the published data to other data sets in the cloud, such as VIAF (Virtual International Authority File) or DBpedia.
With this initiative, the BNE takes the challenge of publishing bibliographic and authority data in RDF, following the Linked Data Principles and under the CC0 (Creative Commons Public Domain Dedication) open license. Thereby, Spain joins the initiatives that national libraries from countries such as the United Kingdom and Germany have recently launched.

Vocabularies and models

IFLA-backed ontologies and models, widely agreed upon by the library community, have been used to represent the resources in RDF. Datos.bne.es is one of the first international initiatives to thoroughly embrace the models developed by IFLA, such as the FR models FRBR (Functional Requirements for Bibliographic Records), FRAD (Functional Requirements for Authority Data), FRSAD (Functional Requirements for Subject Authority Data), and ISBD (International Standard for Bibliographic Description).

FRBR has been used as a reference model and as a data model because it provides a comprehensive and organized description of the bibliographic universe, allowing the gathering of useful data and navigation. Entities, relationships and properties have been written in RDF using the RDF vocabularies taken from IFLA; thus FR ontologies have been used to describe Persons, Corporate Bodies, Works and Expressions, and ISBD properties for Manifestations. All these vocabularies are now available at Open Metadata Registry (OMR), with the status of published. Additionally, in cooperation with IFLA, labels have been translated to Spanish.
MARC21 bibliographic and authority files have been tested and mapped to the classes and properties at OMR. The following mappings were carried out:

A mapping to determine, given a field tag and a certain subfield combination, to which FRBR entity it is related (Person, Corporate Body, Work, Expression). This mapping was applied to authority files.
A mapping to establish relationships between entities.
A mapping to determine, given a field/subfield combination, to which property it can be mapped. Authority files were mapped to FR vocabularies, whereas bibliographic files were mapped to ISBD vocabulary. A number of properties from other vocabularies were also used.

The aforementioned mappings will be soon available to the library community and thus the BNE would like to contribute to the discussion of mapping MARC records to RDF; in addition, other libraries willing to transform their MARC records into RDF will be able to reuse such mappings.

Almost 7 million records transformed under an open license

Approximately 2.4 million bibliographic records have been transformed into RDF. They are modern and ancient monographies, sound-recordings and musical scores. Besides, 4 million authority records of persons, corporate names, uniform titles and subjects have been transformed. All of them belong to the bibliographic and authority catalogues of the BNE stored in MARC 21 format. As for the data transformation, the MARImbA (MARc mappIngs and rdf generAtor) tool has been developed and used. MARiMbA is a tool for librarians, whose goal is to support the entire process of generating RDF from MARC21 records. This tool allows using any vocabulary (in this case ISBD and FR family) and simplifies the process of assigning correspondences between RDFS/OWL vocabularies and MARC 21. As a result of this process, about 58 million triples have been generated in Spanish. These triples are high quality data with an important cultural value that substantially increases the presence of the Spanish language in the data cloud.

Once the data were described with IFLA models, and the bibliographic and authorities catalogues were generated in RDF, the following step was to connect these data with other existing knowledge RDF databases included in the Linking Open Data initiative. Thus, the data of the BNE are now linked or connected with data from other international data source through VIAF, the Virtual International Authority File.

The type of licence applied to the data is CC0 (Creative Commons Public Domain Dedication), a completely open licence aimed at promoting data reuse. With this project, the BNE adheres to the Spanish Public Sector’s Commitment to openness and data reuse, as established in the Royal Decree 1495/ 2011 of 24 October, (Real Decreto 1495/2011, de 24 de octubre) on reusing information in the public sector, and also acknowledges the proposals of the CENL (Conference of European National Librarians).

Future steps

In the short term, the next steps to carry out include

Migration of a larger set of catalogue records.
Improvement of the quality and granularity of both the transformed entities and the relationships between them.
Establishment of new links to other interesting datasets.
Development of a human-friendly visualization tool.
SKOSification of subject headings.

Team

From BNE: Ana Manchado, Mar Hernández Agustí, Fernando Monzón, Pilar Tejero López, Ana Manero García, Marina Jiménez Piano, Ricardo Santos Muñoz and Elena Escolano.
From UPM: Asunción Gómez-Pérez, Elena Montiel-Ponsoda, Boris Villazón-Terrazas and Daniel Vila-Suero.

Sweden Ends Negotiations with OCLC

Adrian Pohl — Thu, 12 Jan 2012 13:22:05 +0000

The following guest post is by Maria Kadesjö who works at the Libris-department at the National Library of Sweden.

The national library of Sweden has ended negotiations with OCLC on participation in WorldCat, as the parties could not come to an agreement. The negotiations go back to 2006 and the key obstacle for the national library has been the record use policy. Some time into the negotiations OCLC presented certain conditions for how records taken from WorldCat for cataloguing were to be used in Libris.

A No to WorldCat Rights and Responsibilites

The question has its base in the “WorldCat Rights and Responsibilities for the OCLC Cooperative” where you as an OCLC member have to accept certain conditions and the aim is to support the ongoing and long-term viability and utility of WorldCat (and its services). The National Library cannot accept the conditions as they are today since WorldCat is not the only arena in which The National Library wants and needs to be active. Accepting the conditions would mean that we would forever have to relate to OCLC’s policy.

Libris an open database

Libris is the Swedish union catalogue with (with some 170 libraries, primarily academic libraries) and is built quite similar to WorldCat but on a smaller scale. Member libraries catalogue their records in Libris and the records are then exported to their local library systems. The Libris-cooperative is built on voluntarily participation. Any Libris-library should be able to, whenever they want, take out all their bibliographic records from Libris and use them in another library system. The National library makes no claim to the records and do not control how the libraries chose to use their bibliographic records.

The National Library has taken the decision to release the national bibliography and authority data as open data. The reason for this is to acknowledge the importance of open data and the importance of libraries’ control of their data when it comes to the long term sustainability and competition of the services needed by libraries and their users.

In the Agreement on Participation in the Libris joint catalogue signed by the National Library and cataloguing libraries, Point 3.3 specifies that “the content in Libris is owned by the National Library and is freely accessible in accordance with the precepts and methods reported by the National Library, both for Participating Libraries and for external partners.” This paragraph is needed so that the National library can sign agreements for Libris with other partners (OCLC for example) but also so that the National Library can abstain from claims of ownership of bibliographic records taken from Libris. Libris can therefore be an open database, both for the libraries that use Libris for cataloguing and for others.

Not consistent with Libris principles

The consequences of signing with OCLC would be that the National Library would have to supervise how records originating in WorldCat were used. And a library that took a bibliographic record from WorldCat for cataloguing in Libris and exported it to its own system would have to accept OCLC´s term of use.

A library that wished to leave Libris would not obviously be able to do this, since it is not self-evident that the bibliographic records could be integrated into other systems. This would be an infringement in the voluntarily participation that characterizes Libris. In practice, the National Library has no mandate that restricts the freedom of action of Libris’ libraries in this way, since the National Library has no possibility of influencing how the Libris libraries themselves choose to use their catalogues.

DBLP releases its 1.8 million bibliographic records as open data

Adrian Pohl — Fri, 09 Dec 2011 14:20:27 +0000

The following guest post is by Marcel R. Ackermann who works at the Schloss Dagstuhl – Leibniz Center for Informatics on expanding the DBLP computer science bibliography.

Computer Science literature

Right from the early days of the DBLP, the decision has been made to make its whole data set publically available. Yet, only at the age of 18 years, DBLP adopted an open-data license.

The DBLP computer science bibliography provides access to the metadata of over 1.8 million publications, written by over 1 million authors in several thousands of journals or conference proceedings series. It is a helpful tool in the daily work of researchers and computer science enthusiasts from around the world. Although DBLP started with a focus on database systems and logic programming (hence the acronym), it has grown to cover all disciplines of computer science.

The success of DBLP wasn’t planned. In 1993, Michael Ley from the University of Trier, Germany, started a simple webserver to play around with this so-called “world wide web” everybody was so excited about in these days. He chose to set up some webpages listing the table of contents of recent conference proceedings and journal issues, some other pages listing the articles of individual authors, and provided hyperlinks back and forth between these pages. People from the computer science community found this quite useful, so he just kept adding papers. Funds were raised to hire helpers, some new technologies were implemented, and the data set grew over the years.

The approach of of DBLP has always been a pragmatic one. So it wasn’t until the recent evolution of DBLP into a joint project of the University of Trier and Schloss Dagstuhl – Leibniz Center for Informatics that the idea of finding a licensing model came to our minds. In this process, we found the source material and the commentaries provided by the Open Knowledge Foundation quite helpful. We quickly concluded that either the PDDL or the ODC-by license would be the right choice for us. In the end, we choose ODC-by since, as researchers ourself, it is our understanding that external sources should be referenced. Although from a pragmatic point of view, nothing has changed at all for DBLP (since permissions to use, copy, redistribute and modify had been generally granted before) we hope that this will help to clarify the legal status of the DBLP data set.

For additional information about access to and technical details of the dataset see the corresponding entry on the Data Hub.

Credits: Photo licensed CC-BY-SA by Flickr user Unhindered by Talent.

Finnish Turku City Library and the Vaski consortia now Open Data with 1.8M MARC-records

Mace Ojala — Thu, 13 Oct 2011 18:50:34 +0000

Let's open up our metadata containers

I’m happy to announce that our Vaski-consortia of public libraries serving total 300 000 citizens in Turku and the a dozen surrounding municipalities in western Finland, have recently published all of our 1.8 million bibliographical records in the open, as a big pile of data (see on The Data Hub).

Each of the records describes a book, recording, movie, song or other publication in our library catalogue. Titles, authors, publishing details, library classifications, subject headings, identifiers and so on systematically saved in MARC -format, the international, structured library metadata standard since the late 1960s.

Unless I’ve missed something, ours is the third large scale Open Data -publication from the libraries of Finland. The first one was the 670 000 bibliographical records of HelMet-consortia (see on The Data Hub), an another consortia of public libraries around the capital Helsinki. This first publication was organized and initiated in 2010 by Kirjastot.fi Labs, a project seeking for more agile, innovative library concepts. The second important Open Data publication was our national generic theseurus Yleinen suomalainen asiasanasto YSA which is also available as a cool semantic ontology.

Joining this group of Open Data publications was natural for our Vaski-consortia, because we are moving our data from one place to another anyway; we are in the middle of the process of converting from our national FinMARC -flavour to the international MARC21 -flavour of MARC, swapping our library system from Axiell PallasPro to Axiell Aurora, plus implementing a new, ambitious search and discovery interface for all the Finnish libraries, archives and museums (yes, it’s busy times here and we love the taste of a little danger). All this means we are extracting, injecting, converting, mangling, breaking, fixing, disassembling and reassembling all of our data. So, we asked ourselves, why not publish all of our bibliographical data on the net while we are on it?

The process of going Open Data has been quite seamless for us. On my initiative the core concept of Open Data was explained to the consortia’s board. As there were no objections or further questions, we contacted our vendor BTJ who immidiately were supporting the idea. From there on it was basically just about some formalities with BTJ, consulting international colleagues regarding licensing, writing a little press-release, organizing a few hundred megabytes of storage space on the internet. And trying to make sure the Open Data -move didn’t get buried under other, more practical things during the summertime.

For our data license we have chosen the liberal Creative Commons-0 license (CC0), because we try to have as little obstructions to our data as possible. However we have agreed on a 6 month embarko with BTJ, a company who is doing most of the cataloguing for the Finnish public libraries. We believe that it is a good compromise to prefer publishing data that is slightly outdated, than try to make the realm of immaterial property rights any more unclear than it already is.

Traditional library metadata at Turku main library

We seriously cannot anticipate what our Open Data -publication will lead to. Perhaps it will lead to absolutely nothing at all. I believe most organizations opening up their data face this uncertainty. However what we do know for sure is, that all of the catalogue records we have carefully crafted, acquired and collected, are seriously underutilized if they are only used for one particular purpose: finding and locating items in the library collections.

For such a valuable assett as our bibliographical metadata, I feel this is not enough. By removing obstacles for accessing our raw data, we open up new possibilities for ourselves, for our colleagues (understood widely), and to anybody interested.

Mace Ojala, project designer
Turku City Library/Vaski-consortia; National Digital Library of Finland, Cycling for libraries, etc.
http://xmacex.wordpress.com, @xmacex, Facebook etc.

Open bibliographic data checklist

Adrian Pohl — Sun, 25 Sep 2011 07:45:27 +0000

This guest post by Jindřich Mynarz was originally published here under a Creative Commons Attribution license.

I have decided to write a few points that might be of interest to those thinking about publishing open bibliographic data. The following is a fragment of an open bibliographic data checklist, or, how to release your library’s data into the public without a lawyer holding your hand.

I have been interested in open bibliographic data for a couple of years now, and I try to promote them at the National Technical Library, where we have, so far, released only authority dataset — the Polythematic Structured Subject Heading System. The following points are based on my experiences with this topic. What should you pay attention to when opening your bibliographic data then?

Make sure you are the sole owner of the data or make arrangements with other owners. For instance, things may get complicated in the case data was created collaboratively via shared cataloguing. If you are not in complete control of the data, then start with consulting the other proprietors that have a stake in the datasets.
Check if the data you are about to release are not bound by some contractual obligations. For example, you may publish a dataset under a Creative Commons licence, soon to realize that there are some unsolved contracts with parties that helped fund the creation of that data years ago. Then you need to discuss this issue with the involved parties to resolve if making the data open is a problem.
Read your country’s legislation to get to know what you are able to do with your data. For instance, in Czech Republic it is not possible to put data into the public domain intentionally. The only way how public domain content is created is by the natural order of things, i.e., author dies, leaves no heir, and after quite some time the work enters the public domain.
See if the data are copyrightable. For instance, if the data do not fall into the scope of the copyright law of your country, it is not suitable to be licenced under Creative Commons since this set of licences draws its legal binding from the copyright law; it is an extension of the copyright and it builds on it. Facts are not copyrightable, and most bibliographic records are made of facts. However, some contain a creative content, for example, subject indexing or an abstract, and as such are appropriate for licencing based on the copyright law. Your mileage may vary.
Consult the database act. Check if your country has a specific law dealing with the use of databases that might add more requirements that need your attention. For example, in some legal regimes databases are protected on other level, as an aggregation of individual data elements.
Different licencing options may be applicable for content and structure of dataset, for instance when there are additional terms required by database law. You can opt in dual-licensing and use two different licences, one for dataset’s content that is protected by copyright law (e.g., a Creative Commons licence), and one for dataset’s structure for which the copyright protectio)).
Choose a proper licence. A proper open may not apply (e.g., Public Domain Dedication and License).
Choose a proper licence. A proper open licence is a licence that conforms with the Open Definition (and will not get you sued), so pick one of the OKD-Compliant licenses. A good source of solid information about licences for open data is Open Data Commons.
BONUS: Tell your friends. Create a record in the Data Hub (formerly CKAN) and add it to the bibliographic data group to let others know that your dataset exists.

Even if it may seem there are lots of things you need to check before releasing open bibliographic data, it is actually easy. It is an performative speech act: you only need to declare your data open to make it open.

Disclaimer: If you are unsure about some of the steps above, see a lawyer to consult it. Note that the usual disclaimers apply for this post, i.e., IANAL.

NTNU University Library – a Linked Open Data Hub

Adrian Pohl — Thu, 08 Sep 2011 13:02:34 +0000

The following guest post is by Rurik Thomas Greenall who works at the section for information resources at NTNU University Library, Trondheim in the area of linked data and resource access. His interests include programming, particularly for the semantic web, and knowledge representation; he speaks and writes on these topics.

NTNU University Library has been a publisher of open data – data released under a liberal licence – since 2009, when we first started working with RDF and linked data. Our aims were initially achieved in project-based form, but linked open data has since been formalized within the structure of the library organization, which makes NTNU University Library one of the few libraries to have positions dedicated to the production of linked data.

The move away from the traditional

The initial ideas we had about linked data stem from a project, UBiT2010, which dealt with modernizing library services. The working group consisted of a cross-section of the library staff who were interested in technology. Some of the major findings in the project were:

re-mix, re-use, re-invent, re-package, re-work
users know their own best
trust and openness
library IT is nothing, there is only IT

These ideas do not seem particularly inspiring any more, however, they were quite cutting edge in 2007–2008, and lead us down a path toward distributed computing and distributed interfaces, as well as a move away from monolithic systems that provided so-called expert interfaces that had frightened away our users in the first place.

At NTNU University Library, the data group was somewhat radicalized toward the openness movement; this was largely the case because of a protectionist agenda we perceived within our commercial partners. There seemed to us to be an intrinsic concept of keeping data closed in the business models of many of the companies we worked with, a fact that we also saw worked against them because those suppliers that were somewhat more open with their data provided better service and showed higher use than those that were less open. Further, openness seems to us to be in the spirit of the web, and its best feature; libraries have also traditionally been important players in regard to openness, so we happily continue this tradition.

Thus, we as a state-funded institution within a social democracy were duty bound to adopt a line that would allow the principles of openness to be applied to their fullest extent. After consulting with among others Talis, we found out that the only way forward was public domain and zero licences. Thus, we have adopted a policy of using the ODC-PDDL on all of our data (we reserve some rights related to non-data content typically providing content documents – i.e. documents that are the traditional literature holdings of the library, where we own the copyright/the copyright has come into the public domain – under CC-BY-NC-licences). Note that the applicability of licensing such as the Norwegian licence for open data (Norsk lisens for offentlige data) in international arenas where we operate is uncertain, and it is certainly the case that we resist the conservative impulses that require attribution for data; in fact, we actively encourage others to adopt ODC-PDDL in combination with the use of HTTP URIs as the URI acts as a provenance and token of trust, something intrinsically more important than rights attribution when it comes to open data.

Another outcome from this first project was that we needed a lot more access to our data than we had had previously. This fact lead us on a journey of discovery related to APIs (such as Ex Libris’ X-Server and Z39.50 and SRU), none of which really provided us with what we needed, which was real-time access to our data in a way that made sense in a web-services context.

The first linked open data baby steps

Moving away from the traditional library IT scene, in June 2009, we were invited to a seminar on linked data hosted by the National Library of Sweden, the Norwegian Library Laboratory (Bibliotekslaboratoriet) and Talis. This meeting was critical, because it brought together two core ideas: the use of standard technologies to deliver library data and the idea of openness.

Once the pieces had fallen into place, we started working toward our goals of producing linked open data from our data sets. The typical barriers were in place: getting hold of the data; who actually owns the data and getting someone sign the papers that will allow us to release the data. In the end, these pieces fell into place because the library management were brave enough to just do the job, signing the licences and letting us take the step of publishing data openly.

Our first project was TEKORD, a controlled vocabulary that provides Norwegian terms for technical subjects and Universal Decimal Classification (UDC) numbers. The data set is open RDF, but it is not currently linked to anything because of the lack of data sets containing UDC numbers. The data set was, however, immediately snapped up by the UDC Consortium and used as the basis for the Norwegian representation of the top-level terms for UDC, which was rather encouraging.

Rådata nå!

In 2010, NTNU University Library and BIBSYS (the state-owned, Norwegian national library management system for academic and research libraries) received funding for a project called Rådata nå! (Norwegian, raw data now) which published a data set containing 1.5 million person name authorities as linked open data, aligning this data with among others DBpedia and VIAF. The project produced 10 million triples and was the first large data set to be released as linked open data in Norway (under the ODC-PDDL). The reason that name authorities were chosen was that they can be seen as fixed points of reference on the web, and can be used to identify individuals on the web of data who are rarely represented otherwise (the data set includes many master’s degree theses, and so includes the names of many prominent and not so prominent people).

While this data set was a huge milestone for us, it was far from the only data set we have released; much of our strategy has been to create sustainable systems that we use ourselves, creating an available source for our business data at the same time as enriching it with other data; in this way we are both open and strategically aware in equal measure.

(Actually) using linked open data

The choice of linked open data brings together a means of representation and enrichment that was not possible using other technologies, and while the learning curve for us has not been easy, it has been rewarded in many respects. Our use of open data has created the opportunity to create the kinds of system that we feel our users will be inspired by, that answer the questions users have and help them find more than they were looking for. This extends beyond the concept of monolithic systems where our aim was to get users into our systems; in fact it is entirely about getting the data out there, ensuring that it is available without reference to a given web page. Openness is the key, and this way of thinking has improved our work immensely.

By using other people’s data, you gain insight into how to integrate our domain (that of traditional library data; bibliographic data, controlled vocabularies, etc.) with those domains that we describe and serve. Typical examples of this include using geographical terms from geographical resources in addition to the gazetteers that were typically used in bibliographic description and linking to resources of scientific information. This adds value in the form of alignment with resources that provide additional information that can be used when building services on top of the data.

Another case for our use of linked open data is the multilingual situation in academia in Norway, where Norwegian and English exist side by side in a way that is common in other cultures. Additionally, Norwegian is joined by other languages from Norway (Sámi and Kvensk) as well as the written standard for Norwegian being represented in two varieties (Bokmål and Nynorsk). This relatively complex situation is solved easily by co-ordination with data sets that represent the same information in the different languages.

Since these early efforts, we have moved ever more towards workflows centred around linked data; providing representations of and access to scholarly journals, research databases, researcher information, academic conference authorities as well as controlled vocabularies such as Norwegian MeSH and Norwegian scientific disciplines. These all feature in our approach to using and releasing our data, and we happily share and align our work with what other people are doing.

Today’s situation

The current projects that are being undertaken are presentation of the library’s special collections and statistical modelling of usage. The library’s special collections (manuscripts and rare books) are catalogued and presented in a workflow that is based purely on linked open data (because the documents are catalogued directly as RDF and the systems use linked open data as their only data resources). This project integrates all of the aspects of current semantic web, including reasoning and in-HTML standards like RDFa. The benefits of this approach have been enormous, allowing agile, rapid development of interface and data to meet changing needs, as well as a high level of visibility in the search engines we have targeted. We are currently running workshops to see how we can work together with other holders of unique historical materials to see how we can achieve a co-ordinated web of data for these materials.

It is evident for us that using linked open data not only provides the access we need to our business data, it is also a way of enriching it with the data provided by others. It is also clear that in an age where “interconnected” is the norm, any other approach not only limits your success, it probably excludes it.

About NTNU University Library

NTNU University Library is the library of the Norwegian University of Science and Technology (Norges Teknisk-Naturvitenskapelige universitet – NTNU), based in Trondheim. It provides services to the university for subjects including the sciences and technology as well as humanities and social sciences. As a centre for development within technical library and information science, the staff at the library participate in both national and international collaborative projects and help set the scene within current trends for their fields of specialization. The library is host to the biannual international conference emtacl: emerging technologies in academic libraries, a showcase for leading trends within technology within and for academic libraries.

Ex Libris, Alma and Open Data

Adrian Pohl — Thu, 11 Aug 2011 13:29:54 +0000

This blog post is written by Carl Grant, chief librarian at Ex Libris and past president of Ex Libris North America, in answer to some questions that Adrian Pohl, coordinator of the OKFN Working Group on Open bibliographic Data, posed in the beginning of July in response to Ex Libris’ announcement of an “Expert Advisory Group for Open Data”. It is cross-posted on the OKFN blog and openbiblio.net.

The Ex Libris announcement in June 2011 that we were forming an “Expert Advisory Group for Open Data” has generated much discussion and an equal number of questions. Many of the questions bring to light the ever-present tensions and dynamics that exist between the various sectors and advocates of open data and systems. It also raises ongoing questions about how the goals of openness can be reasonably and properly achieved and in what timeframe? Particularly when it involves companies, products and data structures that have roots in proprietary environments.

For those who are not part of the Ex Libris community, allow me to define some of the Ex Libris terminology involved in this discussion:

Alma. The Ex Libris next generation, cloud-based, library management service package that supports the entire suite of library operations—selection, acquisition, metadata management, digitization, and fulfillment—for the full spectrum of library materials.
Community Zone: A major component of Alma that includes the Community Catalog (bibliographic records) and the Central Knowledgebase and Global Authorities Catalog. This zone is where customers may contribute content to the Community Catalog and in so doing, agree to allow users to share, edit, copy and redistribute the contributed records that are linked to their inventory.
Community Catalog: The bibliographic record portion of the Community Zone.
Community Zone Advisory Group: A group of independent experts and Alma early adopters advising Ex Libris on policies regarding the Community Catalog.

Taking into consideration the many emails and conversations we’ve had around the topic, this original set of questions seems to have shared interest:

These are good questions so let’s work our way through this list.

Q: What is this working group actually about? About open licensing, open formats, open vocabularies, open source software or all of them?

A: The Community Zone Advisory Group is tasked with creating high-level guidelines to govern the contribution, maintenance, use and extraction of bibliographic metadata placed in the Community Catalog of Alma. As such, this group is also advising us on suggested licenses to use. Given that each Alma customer will have a local, private catalog that need not be shared, we’ve taken the position that we want to promote the most open policies possible on data that libraries contribute to the Community Catalog. Much of the discussion centers around what approach would be best for libraries and lead to the clearest terms of use for Alma users.

We’ve said to the group, that we’ll leave it to them to determine if the group will need to exist beyond the time this original task charge is completed. We fully expect that we will have similar groups, if not this same one, advise us on other data that we plan to place in the Community Zone in the future.

Q: Does the working group only cover openness inside Ex Libris’ future metadata management system, e.g. openness between paying members in a closed ecosystem, or will it address openness of the service within a web-wide context?

A: A more complicated question because it is really two interlaced questions. First it is asking if the data is open? Second, it is asking if the system is open?

We are most assuredly making the bibliographic metadata open and the answer to the next question provides more detail on how we’re approaching this.

As for the systems holding the data, we are planning on being open, but this is a place where we clearly must wait until this new system is up and running well for our paying customers before we open the Community Catalog up to others. Even then, we’ll want to closely monitor the impact on bandwidth, compute cycles and data to determine any costs associated with this use and how to best proceed and participate in making the Community Catalog open to larger communities on a web-wide context.

The goal is to work with our customers and the community to set achievable goals that move us down that path while factoring practicality into ideology. However, in the first deployment, our primary constituents will be institutions who have adopted Alma.

Q: Will Ex Libris push towards real open data for the Alma community zone? That would mean 1.) Using open (web) standards like HTTP, HTML and RDF as data model etc. 2.) Conforming to the Principles on Open Bibliographic Data by open licensing, 3.) providing APIs and the possibility to download data dumps.

A: Specifically:

Open standards and data formats are core to our design of Alma. They serve as our mechanisms of data exchange and the basis of our data structures when appropriate. Not all standards will be the basis of our internal functionality, however. For example, we’re building the infrastructure for RDF as a data exchange mechanism, but it does not fundamentally underpin our data structure, just as MARC21 binary format is not the root of our bibliographic record structure. When appropriate and possible, we are implementing these standards for data exchange based on libraries’ needs.
We are currently examining the open licenses to determine if we can utilize them given the other data we’d planned for the Community Zone. Currently, our Alma agreements include language that largely replicates the key terms of the Creative Commons PDDL license for customer-contributed records in the Community Catalog.
We will be providing API’s and plan to support downloads, but again, as we move forward, we plan to do this in a phased approach so that we can monitor infrastructure and human resource demands associated with the adoption. As noted above, our first priority will be providing service to our existing Alma users. This means that in the first release, providing Alma institutions with full access to their own data (in the form of API’s and data dumps) is where we’re focusing our attention.

In the final analysis, we feel our approach is really quite supportive of librarianship and quite open. We’re balancing the competing needs of our stakeholders, but it is important to note that Ex Libris is not artificially implementing restrictions that limit open data. Once libraries join the Alma community, there are no limits on their ability to manage their own projects or collaboration. Where we have the resources, we’re helping promote this open approach. We’ll be allocating our resources to provide best-in-class service to our customers and, at the same time, in a closely monitored and managed approach, continuing to expand access to larger communities.

We think all of this combined is a powerful statement about how proprietary and open approaches can beneficially co-exist and thus will help to move libraries forward in substantive ways.