Swedish National Bibliography as Open Data

Posted on September 21, 2011 by Adrian Pohl

In the blog of Sweden’s national library system LIBRIS it was announced today that the Swedish National Bibliography along with the authority data are published under a CC0 license.

“We are now pleased to announce the general availability of these records (see below for details). We see the investment in Open Data as a strategic one and one that is needed to ensure long term sustainability and competition when it comes to the services needed by libraries and their users as well as the right to control over their collections. The license chosen is CC0 which waives any rights the National Library have over the National Bibliography and the authority data.

There are two ways to access the data, as Atom feeds with references to the records and using (a somewhat rudimentary implementation of) the OAI-PMH protocol.”

LIBRIS pioneered in 2008 with making the records in the Swedish union catalog available as Linked Data. But the data had not been openly licensed and there was no possibility to easily get hold of bigger parts of it, which both has changed as of today. The long term goal is, it says in the announcement regarding the Swedish Union Catalogue, “to release the whole database under an open license, though this will undoubtedly take some time.”

Posted in Data, LOD-LAM | Tagged national library | Leave a comment

LOD at Bibliothèque nationale de France

Posted on September 21, 2011 by Adrian Pohl

Romain Wenz of the Bibliothèque nationale de France (BnF) informed me via email about this pleasant development: The BnF’s Linked Data service now is a Linked Open Data service!

The BnF LOD service is in the first instance limited to classical authors and currently comprises approximately 1,600 authors and nearly 4,000 works described in 1.4 million RDF triples. See also the accompanying entry on the Data hub. More information about data.bnf.fr can be found in this use case for the W3C Linked Library Data Incubator Group.

On http://data.bnf.fr/semanticweb one can find a link to a full dump of the RDF data. The corresponding license text reads:

” La réutilisation des données exposées en format RDF est libre et gratuite sous réserve du respect de la législation en vigueur et du maintien de la mention de source “Bibliothèque nationale de France” auprès des données. L’utilisateur peut les adapter ou les modifier, à condition d’en informer clairement les tiers et de ne pas en dénaturer le sens.”

This looks like an attribution license to me (my French is not the best!), but it is not made clear – neither on the webpage nor in the LICENCE.txt accompanying the dump – how the requirement of attribution should be met in practice. (The BnF might learn in this respect from New Zealand National Library’s approach.)

It also says that data in other formats than RDF is still licensed under a non-commercial license:

“La réutilisation des données exposées dans un autre format est soumise aux conditions suivantes :

la réutilisation non commerciale de ces données est libre et gratuite dans le respect de la législation en vigueur et notamment du maintien de la mention de source “Bibliothèque nationale de France” auprès des données. L’utilisateur peut les adapter ou les modifier, à condition d’en informer clairement les tiers et de ne pas en dénaturer le sens.
la réutilisation commerciale de ces contenus est payante et fait l’objet d’une licence. Est entendue par réutilisation commerciale l’acquisition des données en vue de l’élaboration d’un produit ou d’un service destiné à être mis à disposition de tiers à titre onéreux, ou à titre gratuit mais avec une finalité commerciale. Cliquer ici pour accéder aux tarifs et à la licence.”

Posted in Data, LOD-LAM, Semantic Web | Tagged national library | Leave a comment

Minutes: 14th Virtual Meeting of the OKFN Openbiblio Group

Posted on September 14, 2011 by Adrian Pohl

Date September, 13th 2011, 15:00 GMT

Channel Meeting was held via Skype and Etherpad

Participants

Adrian Pohl
Peter Murray-Rust
Jim Pitman
Karen Coyle

Agenda

Brief reports from Jim

Progress on BibServer/BibSoup

http://bibsoup.net/
We have working uploads of BibTeX data with conversion to BibJSON, updating elastic search index, and faceted search and browse. Please browse around and register issues on the feedback form. May need to get a login for that.

Invitation from Microsoft Academic Search

email from Greg Tannenbaum who is contracting for MAS, Aug 9 : “we have been discussing the Open Bibliography project and it struck us that it would be useful to explore how we might get Microsoft Academic Search designated as compliant with your standards. Have you and the team thought about this type of seal of approval? If so, can we set up a time to discuss how we might qualify. We see Open Bibliography as a valuable undertaking and would like to both promote it and comply with it if possible”.
Jim has a call scheduled on this with Lee Dirks for 1:00 PM PDT for Tuesday, 9/20.
Agreed on strategy to engage MAS in openbiblio. Try to get them to provide export with an open license.

Google Scholar Citations

Jim points out Google Scholar Citations
http://googlescholar.blogspot.com/2011/07/google-scholar-citations.html
Example author pages:
- http://scholar.google.com/citations?user=4Z5WABAAAAAJ
- http://scholar.google.com/citations?user=cH0pbIwAAAAJ&hl=en
Strategy: Focus on getting openbiblio data from MAS first and then ask google.

The European Digital Mathematics Library (EuDML)

Jim points out The European Digital Mathematics Library –> what do they understand under “open access”?
Jim has asked them to provide open metadata.

Centre for Open Science (CeON (Centrum Otwartej Nauki in Polish)

is a newly created subunit of Interdisciplinary Centre for Mathematical and Computational Modelling (ICM), http://www.icm.edu.pl/ which is in turn an autonomous research department of University of Warsaw, a public university, funded by the Polish Ministry of Science and Higher Education. CeON is in its inception and its webpage is still under construction. It should be visible to the outside world in about two months.
CeON has four priorities of equal importance:
1. promotion of open science,
2. providing access to scientific literature and databases for the scientific community
3. software development for digital repositories and
4. research in the area of document analysis, text mining, bibliometrics.

Name spaces for bibsoup

Need help with name spaces
From http://info-uri.info/ which is now deprecated: “The Linked Data idiom is currently ascendant, and accommodates both resource resolution and identification, which is different than the simple “info” premise of URI identification alone. This approach to resource identity is likely to conform more closely to evolving practice”.
We need guidance on managing name spaces for use in BibJSON for BibServer/BibSoup dev. (For ISBN there is a urn:isbn-namespace. (RFC: http://tools.ietf.org/html/rfc3187.)

Berkman Digital Registry Proposal

See this mail to the list
Are they re-inventing Public Domain Works?

ACTION: Jim, Karen and Adrian will come together to collect solutions for finding/minting URIs for legacy identifiers.

Bibliographic Metadata Guide

See http://lists.okfn.org/pipermail/openbiblio-dev/2011-September/000376.htm
Adrian wanted to get some input on this.
Unfortunately, Primavera wasn’t there to talk about these activities.
Karen: The initiative needs a more structured approach.

ACTION: Adrian & Karen will make proposals for improvement on the openbiblio-dev list.

Author’s Guild suing HathiTrust

Quick report by Karen, see http://kcoyle.blogspot.com/2011/09/authors-guild-sues-hathitrust.html
Twice mentions bibliographic metadata in ways that seem to include it in “unauthorized use”
Sections from the suit:
sec. 43:
“The digital copy comprises a set of scanned image files, files containing the text of the work extracted through optical character recognition (“OCR”) technology, and data associated with the work indicating bibliographic and other information.”

sec. 59
“Third, HathiTrust provides a variety of tools to allow its users to access content in the HDL. For example, all users may search and identify bibliographic information (title, author, subject, ISBN, publisher, and year of publication) for the works contained in the HDL.”
The concerned bibliographic data is derived from library catalogs.
This case is an opportunity to remind people of our agenda.

ACTION: wait a bit (for responses by others) and then post on the OKFN blog

Open article data from ZB MED

see http://openbiblio.net/2011/09/06/zb-med-publishes-open-article-data-in-medicine-and-life-sciences/
What’s going on with the Medline data? It would be good to bring these together.
Peter: Next week they will hear about their JISC proposal and will then know whether work with the PubMed data will be continued.

Bibliographic-archival database by Wikimedia

See http://lists.okfn.org/pipermail/open-bibliography/2011-September/001133.html
Rufus contacted people from Wikimedia. There are two things going on at Wikimedia Germany which are relevant here:
1. The bibliographic database project which current goal is to have a workshop about the topic, and work out how best to proceed. There won’t be significant work done on it in the short term.
2. The WikiData project: http://meta.wikimedia.org/wiki/New_Wikidata. WikiData is to provide a central database for all kinds of structured data, for use in other wikis.
We’ll stay in touch with the people from Wikimedia regarding this.

Posted in minutes | Leave a comment

Open national bibliography data by New Zealand National Library

Posted on September 12, 2011 by Adrian Pohl

A tweet by Owen Stephens prodded me to the New Zealand National Library’s service which provides the national bibliography as MARC/MARCXML dumps (350,000 records) licensed under a Creative Commons Attribution license. Great!

Obviously this service has been around for a while now but I’ve not heard about it before. As it wasn’t registered on CKAN/the Data hub I created an entry and added it to the Bibliographic Data group.

Using attribution licenses for data

This publication is an interesting case as it uses an attribution license for bibliographic data. Until now, most open bibliographic datasets have been published under a public domain license. So, the question pops up: “Under what conditions may I use a CC-BY licensed dataset?”

The readme.txt accompanying the download file (118 MB!) gives some clarity:

“You do not need to make any attribution to National Library of New Zealand Te Puna Matauranga o Aotearoa if you are copying, distributing or adapting only a small number of individual bibliographic records from the overall Dataset.

If you publish, distribute or otherwise disseminate this work to the public without adapting it, the following attribution to National Library of New Zealand Te Puna Matauranga o Aotearoa should be used:

“Source: National Library of New Zealand Te Puna Matauranga o Aotearoa and licensed by the Department of Internal Affairs for re-use under the Creative Commons Attribution 3.0 New Zealand Licence (http://creativecommons.org/licenses/by/3.0/nz/).”

If you adapt this work in any way or include it in a wider collection, and publish, distribute or otherwise disseminate that adaptation or collection to the public, the following style of attribution to National Library of New Zealand Te Puna Matauranga o Aotearoa should be used:

“This work uses data sourced from National Library of New Zealand Te Puna Matauranga o Aotearoa’s Publications New Zealand Metadata Dataset which is licensed by the Department of Internal Affairs for re-use under the Creative Commons Attribution 3.0 New Zealand licence (http://creativecommons.org/licenses/by/3.0/nz/).””

In my opinion, these license requirements set a good precedence licensing bibliographic data with an attribution license, although it is not clear what still passes for “a small number of individual records”. I think it is important and the only legally consistent way that datasets with an attribution or share-alike license must only be attributed at the database level and not on the record level. Other’s who tend to use an attribution license should use a similar wording.

This might be of interest for other approaches of using an attribution license, e.g. at OCLC or E-LIS.

In related news, there’ll be a LODLAM-NZ event on December 1st in Wellington, see http://lod-lam.net/summit/2011/09/08/lodlam-nz/. Converting this dataset to LOD might be a topic…

Update: Tim McNamara has already provided an RDF version of the bibliographic data and reported on his motivations and challenges, see this post.

Posted in Data, licensing | Tagged national library | 1 Comment

NTNU University Library – a Linked Open Data Hub

Posted on September 8, 2011 by Adrian Pohl

The following guest post is by Rurik Thomas Greenall who works at the section for information resources at NTNU University Library, Trondheim in the area of linked data and resource access. His interests include programming, particularly for the semantic web, and knowledge representation; he speaks and writes on these topics.

NTNU University Library has been a publisher of open data – data released under a liberal licence – since 2009, when we first started working with RDF and linked data. Our aims were initially achieved in project-based form, but linked open data has since been formalized within the structure of the library organization, which makes NTNU University Library one of the few libraries to have positions dedicated to the production of linked data.

The move away from the traditional

The initial ideas we had about linked data stem from a project, UBiT2010, which dealt with modernizing library services. The working group consisted of a cross-section of the library staff who were interested in technology. Some of the major findings in the project were:

re-mix, re-use, re-invent, re-package, re-work
users know their own best
trust and openness
library IT is nothing, there is only IT

These ideas do not seem particularly inspiring any more, however, they were quite cutting edge in 2007–2008, and lead us down a path toward distributed computing and distributed interfaces, as well as a move away from monolithic systems that provided so-called expert interfaces that had frightened away our users in the first place.

At NTNU University Library, the data group was somewhat radicalized toward the openness movement; this was largely the case because of a protectionist agenda we perceived within our commercial partners. There seemed to us to be an intrinsic concept of keeping data closed in the business models of many of the companies we worked with, a fact that we also saw worked against them because those suppliers that were somewhat more open with their data provided better service and showed higher use than those that were less open. Further, openness seems to us to be in the spirit of the web, and its best feature; libraries have also traditionally been important players in regard to openness, so we happily continue this tradition.

Thus, we as a state-funded institution within a social democracy were duty bound to adopt a line that would allow the principles of openness to be applied to their fullest extent. After consulting with among others Talis, we found out that the only way forward was public domain and zero licences. Thus, we have adopted a policy of using the ODC-PDDL on all of our data (we reserve some rights related to non-data content typically providing content documents – i.e. documents that are the traditional literature holdings of the library, where we own the copyright/the copyright has come into the public domain – under CC-BY-NC-licences). Note that the applicability of licensing such as the Norwegian licence for open data (Norsk lisens for offentlige data) in international arenas where we operate is uncertain, and it is certainly the case that we resist the conservative impulses that require attribution for data; in fact, we actively encourage others to adopt ODC-PDDL in combination with the use of HTTP URIs as the URI acts as a provenance and token of trust, something intrinsically more important than rights attribution when it comes to open data.

Another outcome from this first project was that we needed a lot more access to our data than we had had previously. This fact lead us on a journey of discovery related to APIs (such as Ex Libris’ X-Server and Z39.50 and SRU), none of which really provided us with what we needed, which was real-time access to our data in a way that made sense in a web-services context.

The first linked open data baby steps

Moving away from the traditional library IT scene, in June 2009, we were invited to a seminar on linked data hosted by the National Library of Sweden, the Norwegian Library Laboratory (Bibliotekslaboratoriet) and Talis. This meeting was critical, because it brought together two core ideas: the use of standard technologies to deliver library data and the idea of openness.

Once the pieces had fallen into place, we started working toward our goals of producing linked open data from our data sets. The typical barriers were in place: getting hold of the data; who actually owns the data and getting someone sign the papers that will allow us to release the data. In the end, these pieces fell into place because the library management were brave enough to just do the job, signing the licences and letting us take the step of publishing data openly.

Our first project was TEKORD, a controlled vocabulary that provides Norwegian terms for technical subjects and Universal Decimal Classification (UDC) numbers. The data set is open RDF, but it is not currently linked to anything because of the lack of data sets containing UDC numbers. The data set was, however, immediately snapped up by the UDC Consortium and used as the basis for the Norwegian representation of the top-level terms for UDC, which was rather encouraging.

Rådata nå!

In 2010, NTNU University Library and BIBSYS (the state-owned, Norwegian national library management system for academic and research libraries) received funding for a project called Rådata nå! (Norwegian, raw data now) which published a data set containing 1.5 million person name authorities as linked open data, aligning this data with among others DBpedia and VIAF. The project produced 10 million triples and was the first large data set to be released as linked open data in Norway (under the ODC-PDDL). The reason that name authorities were chosen was that they can be seen as fixed points of reference on the web, and can be used to identify individuals on the web of data who are rarely represented otherwise (the data set includes many master’s degree theses, and so includes the names of many prominent and not so prominent people).

While this data set was a huge milestone for us, it was far from the only data set we have released; much of our strategy has been to create sustainable systems that we use ourselves, creating an available source for our business data at the same time as enriching it with other data; in this way we are both open and strategically aware in equal measure.

(Actually) using linked open data

The choice of linked open data brings together a means of representation and enrichment that was not possible using other technologies, and while the learning curve for us has not been easy, it has been rewarded in many respects. Our use of open data has created the opportunity to create the kinds of system that we feel our users will be inspired by, that answer the questions users have and help them find more than they were looking for. This extends beyond the concept of monolithic systems where our aim was to get users into our systems; in fact it is entirely about getting the data out there, ensuring that it is available without reference to a given web page. Openness is the key, and this way of thinking has improved our work immensely.

By using other people’s data, you gain insight into how to integrate our domain (that of traditional library data; bibliographic data, controlled vocabularies, etc.) with those domains that we describe and serve. Typical examples of this include using geographical terms from geographical resources in addition to the gazetteers that were typically used in bibliographic description and linking to resources of scientific information. This adds value in the form of alignment with resources that provide additional information that can be used when building services on top of the data.

Another case for our use of linked open data is the multilingual situation in academia in Norway, where Norwegian and English exist side by side in a way that is common in other cultures. Additionally, Norwegian is joined by other languages from Norway (Sámi and Kvensk) as well as the written standard for Norwegian being represented in two varieties (Bokmål and Nynorsk). This relatively complex situation is solved easily by co-ordination with data sets that represent the same information in the different languages.

Since these early efforts, we have moved ever more towards workflows centred around linked data; providing representations of and access to scholarly journals, research databases, researcher information, academic conference authorities as well as controlled vocabularies such as Norwegian MeSH and Norwegian scientific disciplines. These all feature in our approach to using and releasing our data, and we happily share and align our work with what other people are doing.

Today’s situation

The current projects that are being undertaken are presentation of the library’s special collections and statistical modelling of usage. The library’s special collections (manuscripts and rare books) are catalogued and presented in a workflow that is based purely on linked open data (because the documents are catalogued directly as RDF and the systems use linked open data as their only data resources). This project integrates all of the aspects of current semantic web, including reasoning and in-HTML standards like RDFa. The benefits of this approach have been enormous, allowing agile, rapid development of interface and data to meet changing needs, as well as a high level of visibility in the search engines we have targeted. We are currently running workshops to see how we can work together with other holders of unique historical materials to see how we can achieve a co-ordinated web of data for these materials.

It is evident for us that using linked open data not only provides the access we need to our business data, it is also a way of enriching it with the data provided by others. It is also clear that in an age where “interconnected” is the norm, any other approach not only limits your success, it probably excludes it.

About NTNU University Library

NTNU University Library is the library of the Norwegian University of Science and Technology (Norges Teknisk-Naturvitenskapelige universitet – NTNU), based in Trondheim. It provides services to the university for subjects including the sciences and technology as well as humanities and social sciences. As a centre for development within technical library and information science, the staff at the library participate in both national and international collaborative projects and help set the scene within current trends for their fields of specialization. The library is host to the biannual international conference emtacl: emerging technologies in academic libraries, a showcase for leading trends within technology within and for academic libraries.

Posted in guest post, LOD-LAM, OKFN Openbiblio | 4 Comments

ZB MED publishes open article data in medicine and life sciences

Posted on September 6, 2011 by Adrian Pohl

We are happy to announce the recent publication of article data in medicine and life sciences by the German National Library of Medicine (ZB MED). Two datasets have been published in August and registered at CKAN/the Data Hub:

CC MED (Current Contents Medicine): data about 650,000 journal articles gathered since 2000 from 650 German or German-speaking journals in medical and health-related fields (see the CKAN/Data Hub entry and the general information page about the data). 90% of this data isn’t part of the MedLine dataset.
CC Green (Current Contents Nutrition. Environment. Agriculture.): data about 9,000 journal articles gathered sind January 2011 from 200 German or German-speaking journals in applied life sciences (see CKAN/Data Hub entry and here).

The ZB MED produces this data by scanning the journals and extracting the information from the OCR’ed text, in part manually. Until now it is only used in the discovery service MEDPILOT. Currently, the open data is provided in the export format of the SISIS library system. A format documentation will follow. It shouldn’t be too hard, though, to understand the relevant fields. Anybody up for converting this to Linked Open Data?

Journals descriptions in lobid.org

Regarding the journal data itself (the least granular description level), for each referenced journal there is – since the recent update – a URI with an RDF description as Linked Open Data in lobid.org. Just take the identifier of the German “Zeitschriftendatenbank” (ZDB) (in field “2599.001”) and add it to “http://lobid.org/resource/ZDB” and you’ll get the lobid.org-description. This will help if people are interested in using the data in a LOD environment. Especially it would be useful to merge this with the MedLine data…

Example:

ZDB-ID in first record of CC MED example data is ‘200772-1’
The corresponding lobid.org-URI based on the ZDB-ID is http://lobid.org/resource/ZDB200772-1.

Posted in Data, OKFN Openbiblio | Leave a comment

(Slightly) wider exposure of the costs of academic publishing

Posted on August 30, 2011 by Mark MacGillivray

An article in the Guardian yesterday:

http://www.guardian.co.uk/commentisfree/2011/aug/29/academic-publishers-murdoch-socialist

brings up the profit margins of academic publishers. I found one line particularly interesting:

This is a tax on education, a stifling of the public mind. It appears to contravene the universal declaration of human rights, which says that “everyone has the right freely to … share in scientific advancement and its benefits”.

The article also mentions that the research itself is often paid for by the public. So, assuming we do not wish to continue to be stifled, what would we lose by publishing our findings in an alternative medium – openly on the internet, for example?

Perhaps a measure of quality, a stamp of approval. Peer review, for example, is an important and not insignificant task; however, there is no reason why service providers could not provide peer review services separately from publishing, if necessary. (A mechanical turk of peer review is not beyond the imagination, either…)

Whilst it may be true that publishers provide services we do need, and which we could pay for, it is not true that publishing and disseminating research is itself one of those services, nor that typical current providers or typical current services are the only options available to achieve our goals.

There is only one way to change this; the consumers of the product – that is, us – must stop consuming it; we must find an alternative resource to sustain our needs. At the point where the cost of acquiring our critical resource becomes prohibitively expensive, we must adapt.

Publishers are free to adapt with us, and to offer alternative resources for our consideration. But they are not free to dictate the grounds of access to our resource, unless we continue to give them that freedom.

Posted in News | Leave a comment

openbiblio, bibliographica, bibsoup, bibserver – what’s what?

Posted on August 23, 2011 by Mark MacGillivray

We have (or had) various projects going on in relation to the OKF open bibliographic working group recently, and it seems like it might be a good idea to clarify what names mean what things mean where…

So here is a walkthrough of the various projects and how they relate:

openbiblio.net

This site! The purpose of this site is to be the hub of the OKF Open bibliographic working group; Many of the posts here are related to the working group meetings, which are held on the first Tuesday of every month, as organised by the great Adrian Pohl. We also have blog posts up here from other members of the working group (such as myself), from additional contributors, and from some projects – particularly the recently finished JISC Open Bibliography project.

JISC Open Bibliography

The JISC Open Bibliography project ran from July 2010 to July 2011, with the aim of advocating for open bibliography, getting open bibliographic datasets, and making tools and services to show the value of open bibliographic datasets. Out of this came a number of good developments, all of which are detailed in the final project post, and of those the key ones are as below.

The Open Biblio Principles

Our open bibliographic principles are listed on this site, and have received quite a number of endorsements: http://openbiblio.net/principles

Further development of the Open Biblio software

The Open Biblio software was originally created as part of a project at the University of Edinburgh, by Rufus Pollock and Will Waites. It provides an RDF catalogue and web apps for bibliographic records. It was used to run bibliographica.

Bibliographica

Bibliographica is an instance of the Open Biblio software that has been loaded up with the British Library British National Bibliography. This serves as an example. However we found during work with this that it was difficult to build front end services that used it – there was something of a disparity between the RDF world and the key-value front end world. Not necessarily a technical issue, but perhaps a cultural one. We also found later that the resource required to run a triple store at the scale we were looking at when we received the Medline dataset (20 million records) was a bit beyond what we had available. This roughly coincided with the end of the JISC Open Bibliography project, at which point we began working with Jim Pitman and his BibServer software.

BibServer

BibServer is software originally written by Jim Pitman at University of California Berkeley, to manage small scale bibliographic collections for individuals or departments. We are working to make this software more widely available for this purpose, and in the process we are developing a couple of useful things.

BibJSON

BibJSON is just JSON with some agreement on what we expect particular keys to mean. We would like to write parsers from various other formats into BibJSON, to make it easier for people to share bibliographic records and collections.

BibSoup

Given lots of records in BibJSON, and some tools such as BibServer to make use of them, we could build a huge collection of bibliographic records in a simple format that people can easily share. This would be the BibSoup. We have an instance of the BibServer software up as an Alpha service at http://bibsoup.net, and are now focussing development of BibServer as the platform.

http://bibserver.okfn.org

Details of the BibServer / BibJSON / BibSoup work are going up at http://bibserver.okfn.org. This will serve as the source for information about the code and the standard. We are hoping to build up further engagement around this work via the Bibliographic Knowledge Network.

Bibliographic Knowledge Network

The BibKN originally developed out of some work Jim was involved in: http://www.bibkn.org; and we believe the OKF can take this forward as a community into which people / groups committed to advocating for and providing open bibliography can join. The OKF open biblio working group would be such a group, along with others beyond the OKF scope. Further efforts at community engagement are under way too, via some other projects.

Public Domain Works

The Public Domain Works presents a way to engage with people more generally than just via the perhaps dull approach of bibliographic records – by concentrating on the works themselves. The content of these works is the art – it is what people are really interested in; but, knowing a work is in the public domain may be a very useful thing, and such information could be a facet of an open bibliographic record. This would enable artists, for example, to find material upon which they could build to produce new works. So, ideally, we will be able to store bibliographic records in a bibserver, have tools in place to identify works that are in the public domain, record that fact in the bibliographic record, and then present a collection of public domain works to the community.

Posted in BibServer, OKFN Openbiblio | Leave a comment

Ex Libris, Alma and Open Data

Posted on August 11, 2011 by Adrian Pohl

This blog post is written by Carl Grant, chief librarian at Ex Libris and past president of Ex Libris North America, in answer to some questions that Adrian Pohl, coordinator of the OKFN Working Group on Open bibliographic Data, posed in the beginning of July in response to Ex Libris’ announcement of an “Expert Advisory Group for Open Data”. It is cross-posted on the OKFN blog and openbiblio.net.

The Ex Libris announcement in June 2011 that we were forming an “Expert Advisory Group for Open Data” has generated much discussion and an equal number of questions. Many of the questions bring to light the ever-present tensions and dynamics that exist between the various sectors and advocates of open data and systems. It also raises ongoing questions about how the goals of openness can be reasonably and properly achieved and in what timeframe? Particularly when it involves companies, products and data structures that have roots in proprietary environments.

For those who are not part of the Ex Libris community, allow me to define some of the Ex Libris terminology involved in this discussion:

Alma. The Ex Libris next generation, cloud-based, library management service package that supports the entire suite of library operations—selection, acquisition, metadata management, digitization, and fulfillment—for the full spectrum of library materials.
Community Zone: A major component of Alma that includes the Community Catalog (bibliographic records) and the Central Knowledgebase and Global Authorities Catalog. This zone is where customers may contribute content to the Community Catalog and in so doing, agree to allow users to share, edit, copy and redistribute the contributed records that are linked to their inventory.
Community Catalog: The bibliographic record portion of the Community Zone.
Community Zone Advisory Group: A group of independent experts and Alma early adopters advising Ex Libris on policies regarding the Community Catalog.

Taking into consideration the many emails and conversations we’ve had around the topic, this original set of questions seems to have shared interest:

These are good questions so let’s work our way through this list.

Q: What is this working group actually about? About open licensing, open formats, open vocabularies, open source software or all of them?

A: The Community Zone Advisory Group is tasked with creating high-level guidelines to govern the contribution, maintenance, use and extraction of bibliographic metadata placed in the Community Catalog of Alma. As such, this group is also advising us on suggested licenses to use. Given that each Alma customer will have a local, private catalog that need not be shared, we’ve taken the position that we want to promote the most open policies possible on data that libraries contribute to the Community Catalog. Much of the discussion centers around what approach would be best for libraries and lead to the clearest terms of use for Alma users.

We’ve said to the group, that we’ll leave it to them to determine if the group will need to exist beyond the time this original task charge is completed. We fully expect that we will have similar groups, if not this same one, advise us on other data that we plan to place in the Community Zone in the future.

Q: Does the working group only cover openness inside Ex Libris’ future metadata management system, e.g. openness between paying members in a closed ecosystem, or will it address openness of the service within a web-wide context?

A: A more complicated question because it is really two interlaced questions. First it is asking if the data is open? Second, it is asking if the system is open?

We are most assuredly making the bibliographic metadata open and the answer to the next question provides more detail on how we’re approaching this.

As for the systems holding the data, we are planning on being open, but this is a place where we clearly must wait until this new system is up and running well for our paying customers before we open the Community Catalog up to others. Even then, we’ll want to closely monitor the impact on bandwidth, compute cycles and data to determine any costs associated with this use and how to best proceed and participate in making the Community Catalog open to larger communities on a web-wide context.

The goal is to work with our customers and the community to set achievable goals that move us down that path while factoring practicality into ideology. However, in the first deployment, our primary constituents will be institutions who have adopted Alma.

Q: Will Ex Libris push towards real open data for the Alma community zone? That would mean 1.) Using open (web) standards like HTTP, HTML and RDF as data model etc. 2.) Conforming to the Principles on Open Bibliographic Data by open licensing, 3.) providing APIs and the possibility to download data dumps.

A: Specifically:

Open standards and data formats are core to our design of Alma. They serve as our mechanisms of data exchange and the basis of our data structures when appropriate. Not all standards will be the basis of our internal functionality, however. For example, we’re building the infrastructure for RDF as a data exchange mechanism, but it does not fundamentally underpin our data structure, just as MARC21 binary format is not the root of our bibliographic record structure. When appropriate and possible, we are implementing these standards for data exchange based on libraries’ needs.
We are currently examining the open licenses to determine if we can utilize them given the other data we’d planned for the Community Zone. Currently, our Alma agreements include language that largely replicates the key terms of the Creative Commons PDDL license for customer-contributed records in the Community Catalog.
We will be providing API’s and plan to support downloads, but again, as we move forward, we plan to do this in a phased approach so that we can monitor infrastructure and human resource demands associated with the adoption. As noted above, our first priority will be providing service to our existing Alma users. This means that in the first release, providing Alma institutions with full access to their own data (in the form of API’s and data dumps) is where we’re focusing our attention.

In the final analysis, we feel our approach is really quite supportive of librarianship and quite open. We’re balancing the competing needs of our stakeholders, but it is important to note that Ex Libris is not artificially implementing restrictions that limit open data. Once libraries join the Alma community, there are no limits on their ability to manage their own projects or collaboration. Where we have the resources, we’re helping promote this open approach. We’ll be allocating our resources to provide best-in-class service to our customers and, at the same time, in a closely monitored and managed approach, continuing to expand access to larger communities.

We think all of this combined is a powerful statement about how proprietary and open approaches can beneficially co-exist and thus will help to move libraries forward in substantive ways.

Posted in guest post, OKFN Openbiblio, vendors | 1 Comment

After JISC Open Bibliography

Posted on July 18, 2011 by Mark MacGillivray

The JISC Open Bibliography project has come to an end, but of course the Open Bibliography community remains. We will no longer have regular JISC Open Bib meetings, but I would hope that those involved will continue to engage with the monthly meetings organised by Adrian.

We can continue to build on the output of the JISC Open Bib project by working with the datasets we have available. The British Library are continuing to work on their British National Bibliography dataset, and Ben is continuing to improve his Web GL globe visualisation ( video ) of the MedLine dataset.

I have also taken the Medline dataset and made an attempt at indexing all of it; on top of this I am working on placing a UI that enables people to search the data. This is very prototype-y at the moment, but is available here for people using Firefox (and with no guarantee of stability!) The UI will also form part of the BibServer tool, more details of which are available at http://bibserver.okfn.org

In addition to this, we have the Open Biblio software that runs Bibliographica.org, and it is of course open source and available for others to use. If you are interested in running your own instance, please let us know.

Posted in BibServer, News, OKFN Openbiblio | Leave a comment