Open Bibliography and Open Bibliographic Data

Open Bibliography at the start of 2012

Posted on February 6, 2012 by Naomi Lillie

Adrian’s post about the German National Library prompted me to note down a few other exciting developments over the last month or so. Christmas and the holiday season may be perceived as a time for winding down, but not for these people!

The OCLC released Faceted Application of Subject Terminology (FAST) as Linked Data under an open licence. Their press release gives more information (please note, at present the full database is limited to API access, but OCLC will be making FAST available as a dump under the ODC-BY license in the near future).
NISO is working on a standard for the coding of citations in ebooks, full details here.
Our own Principles on Open Bibliographic Data have been translated into Hungarian (by Dudás Anikó) and Polish (by Karol Langner) so we hope this is the beginning of improved communication in both Hungary and Poland and with native speakers all over the world.
OKFN has been working on various ways of improving the overall field of Open Data, including the Open Metadata Handbook which is designed to help make it easier to harvest and process bibliographic metadata from a variety of sources, for people with little or no technical background, explaining the metadata standards as used by different institutions.
And of course, Peter Murray-Rust has been busy on his blog, exploring issues of US bill HR3699, how to present your list using BibSoup and generally promoting the world of open data.
Finally – watch this space for an announcement about BibSoup, launching in beta later this week!

Many thanks to the terrific group of people who keep us updated with this information. If you would like to be among the first to hear about developments like these, get involved with the Open Bibliographic Working Group and let us know what you’re up to.

Posted in Data, JISC OpenBib, News, OKFN Openbiblio | Tagged jiscopenbib2, wp2, wp3, wp4, wp5, wp6, wp7, wp8 | Leave a comment

Linked Data at the Biblioteca Nacional de España

Posted on February 2, 2012 by Adrian Pohl

The following guest post is from the National Library of Spain and the Ontology Engineering Group (Technical University of Madrid (UPM)).

Datos.bne.es is an initiative of the Biblioteca Nacional de España (BNE) whose aim is to enrich the Semantic Web with library data.

This initiative is part of the project “Linked Data at the BNE”, supported by the BNE in cooperation with the Ontology Engineering Group (OEG) at the Universidad Politécnica de Madrid (UPM). The first meeting took place in September 2010, whereas the collaboration agreement was signed in October 2010. The first set of data was transformed and linked in April 2011, but a more significant set of data was done in December 2011.

The initiative was presented in the auditorium of the BNE on 14th December 2011 by Asunción Gómez-Pérez, Professor at the UPM and Daniel Vila-Suero, Project Lead (OEG-UPM), and by Ricardo Santos, Chief of Authorities, and Ana Manchado Mangas, Chief of Bibliographic Projects, both from the BNE. The attendant audience enjoyed the invaluable participation of Gordon Dunsire, Chair of the IFLA Namespace Group.

The concept of Linked Data was first introduced by Tim Berners-Lee in the context of the Semantic Web. It refers to the method of publishing and linking structured data on the Web. Hence, the project “Linked Data at the BNE” involves the transformation of BNE bibliographic and authority catalogues into RDF as well as their publication and linkage by means of IFLA-backed ontologies and vocabularies, with the aim of making data available in the so-called cloud of “Linked Open Data”. This project focuses on connecting the published data to other data sets in the cloud, such as VIAF (Virtual International Authority File) or DBpedia.
With this initiative, the BNE takes the challenge of publishing bibliographic and authority data in RDF, following the Linked Data Principles and under the CC0 (Creative Commons Public Domain Dedication) open license. Thereby, Spain joins the initiatives that national libraries from countries such as the United Kingdom and Germany have recently launched.

Vocabularies and models

IFLA-backed ontologies and models, widely agreed upon by the library community, have been used to represent the resources in RDF. Datos.bne.es is one of the first international initiatives to thoroughly embrace the models developed by IFLA, such as the FR models FRBR (Functional Requirements for Bibliographic Records), FRAD (Functional Requirements for Authority Data), FRSAD (Functional Requirements for Subject Authority Data), and ISBD (International Standard for Bibliographic Description).

FRBR has been used as a reference model and as a data model because it provides a comprehensive and organized description of the bibliographic universe, allowing the gathering of useful data and navigation. Entities, relationships and properties have been written in RDF using the RDF vocabularies taken from IFLA; thus FR ontologies have been used to describe Persons, Corporate Bodies, Works and Expressions, and ISBD properties for Manifestations. All these vocabularies are now available at Open Metadata Registry (OMR), with the status of published. Additionally, in cooperation with IFLA, labels have been translated to Spanish.
MARC21 bibliographic and authority files have been tested and mapped to the classes and properties at OMR. The following mappings were carried out:

A mapping to determine, given a field tag and a certain subfield combination, to which FRBR entity it is related (Person, Corporate Body, Work, Expression). This mapping was applied to authority files.
A mapping to establish relationships between entities.
A mapping to determine, given a field/subfield combination, to which property it can be mapped. Authority files were mapped to FR vocabularies, whereas bibliographic files were mapped to ISBD vocabulary. A number of properties from other vocabularies were also used.

The aforementioned mappings will be soon available to the library community and thus the BNE would like to contribute to the discussion of mapping MARC records to RDF; in addition, other libraries willing to transform their MARC records into RDF will be able to reuse such mappings.

Almost 7 million records transformed under an open license

Approximately 2.4 million bibliographic records have been transformed into RDF. They are modern and ancient monographies, sound-recordings and musical scores. Besides, 4 million authority records of persons, corporate names, uniform titles and subjects have been transformed. All of them belong to the bibliographic and authority catalogues of the BNE stored in MARC 21 format. As for the data transformation, the MARImbA (MARc mappIngs and rdf generAtor) tool has been developed and used. MARiMbA is a tool for librarians, whose goal is to support the entire process of generating RDF from MARC21 records. This tool allows using any vocabulary (in this case ISBD and FR family) and simplifies the process of assigning correspondences between RDFS/OWL vocabularies and MARC 21. As a result of this process, about 58 million triples have been generated in Spanish. These triples are high quality data with an important cultural value that substantially increases the presence of the Spanish language in the data cloud.

Once the data were described with IFLA models, and the bibliographic and authorities catalogues were generated in RDF, the following step was to connect these data with other existing knowledge RDF databases included in the Linking Open Data initiative. Thus, the data of the BNE are now linked or connected with data from other international data source through VIAF, the Virtual International Authority File.

The type of licence applied to the data is CC0 (Creative Commons Public Domain Dedication), a completely open licence aimed at promoting data reuse. With this project, the BNE adheres to the Spanish Public Sector’s Commitment to openness and data reuse, as established in the Royal Decree 1495/ 2011 of 24 October, (Real Decreto 1495/2011, de 24 de octubre) on reusing information in the public sector, and also acknowledges the proposals of the CENL (Conference of European National Librarians).

Future steps

In the short term, the next steps to carry out include

Migration of a larger set of catalogue records.
Improvement of the quality and granularity of both the transformed entities and the relationships between them.
Establishment of new links to other interesting datasets.
Development of a human-friendly visualization tool.
SKOSification of subject headings.

Team

From BNE: Ana Manchado, Mar Hernández Agustí, Fernando Monzón, Pilar Tejero López, Ana Manero García, Marina Jiménez Piano, Ricardo Santos Muñoz and Elena Escolano.
From UPM: Asunción Gómez-Pérez, Elena Montiel-Ponsoda, Boris Villazón-Terrazas and Daniel Vila-Suero.

Posted in Data, guest post, LOD-LAM, Semantic Web | Tagged national library | 2 Comments

German National Library goes LOD & publishes National Bibliography

Posted on January 26, 2012 by Adrian Pohl

Good news from Germany. The German National Library

changed its licensing regime for Linked Data to CC0 which makes the data open according to the open definition,
has begun to publish the German national bibliography as Linked Open Data.

For background see the email (German) announcing this step. There it says (my translation):

“In 2010 the German National Library (DNB) started publishing authority data as Linked Data. The existing Linked Data service of the DNB is now extended with title data. In this context the licence for linked data is shifted to “Creative Commons Zero.

Until now, the majority of DNB title data is implemented as well as periodicals and series – the music data and holdings of the German Exiles Archive are missing. From now on, the RDF/XML representation of a title record is available in the DNB portal via a link. This is expressly an experimental service which will be extended and refined continually. More detailed informations about modelling questions and the general approach can be fund in the updated documentation.”

The English documentation (PDF) hasn’t been updated yet and only describes the GND authority data. On the wiki page about the LOD service it says: “Examples and further information about FTP-downloads will come soon.” An entry on the Data Hub has already been made for the data.

Posted in Data | Tagged national library | 3 Comments

BibServer screencast and user perspective

Posted on January 25, 2012 by Naomi Lillie

BibServer software allows people (you, me, the person in the office down the road) to hold and share collections of searchable data. Be it the list of books you have to read for your course this semester, the publications you produced in your research, the database of all staff at your organisation or your neatly categorised weekly shop (‘aisle 7: toothpaste, but only if BOGOF’), BibServer allows you to view, search, share and maintain this information.

If, like me, you are not of a technical bent, do not despair – Mark has created a straightforward video guide on how to use it and how it’s useful:

Setting up a Bibserver and Faceted Browsing (Mark MacGillivray) from Bibsoup Project on Vimeo.

Mark Williamson, a post-doctoral researcher at Cambridge University, was introduced to BibServer and we filmed him talking about using it a (very) short while later:

Ingesting a personal collection of references into Bibserver (Mark Williamson) from Bibsoup Project on Vimeo.

Thanks to Peter for his camera and video-production skills, and of course his blog.

Posted in BibServer, JISC OpenBib | Tagged jiscopenbib2, wp3, wp4, wp5, wp8 | 2 Comments

Sprint videos

Posted on January 24, 2012 by Naomi Lillie

Last week’s sprint produced more than just parsers, game-plans and blog posts (Day 1, Day 2 and Day 3): it also allowed Peter and Naomi to stretch their directorial wings and produce some video blogs to record what we were doing as we went along. With minimal journalistic credentials (ok, none), we relied on the natural animation of the participants to sell the story… See how you think they did:

Interview with Mark MacGillivray, Openbiblio 2 project from Bibsoup Project on Vimeo.

Peter Murray-Rust Co-I Openbiblio project talks from Bibsoup Project on Vimeo.

Bibsoup: Interview with Etienne Posthumus (developer) from Bibsoup Project on Vimeo.

Interview with Ed Chamberlain, Openbiblio2 project from Bibsoup Project on Vimeo.

Thanks to Peter for the videos as presented on his brilliant blog!

Posted in BibServer, event, JISC OpenBib | Tagged jiscopenbib2, wp3, wp4 | 2 Comments

Thursday 19th January – Open Biblio Sprint: Day 3

Posted on January 19, 2012 by Naomi Lillie

Today we were joined by additional members of the OKFN team from various parts of the world – Ira, Sam and Primavera. Then the fun began…

Sam and Mark discussed the interface between Open Biblio and the TEXTUS project, looking at text and image processing. Project Gutenberg was suggested as one possible avenue, exploring scanned archives being processed in order to provide searchable text. It was agreed that openphilosophy.org would be the best central point of reference for this data, as an instance of TEXTUS with BibServer support in the background.
Mark and Ira discussed how to present CKAN / BibServer at events such as Dev8D – there is cross-over between the two, and we took the opportunity to learn more about both projects. CKAN is a purpose-built data catalogue with flexible addons and a mature open source product. Both are part of the OKFN and, combined, are an easy way to publish and find data and references. It was agreed that working more closely together would be of mutual benefit as well as to the wider community.
Peter and Mark discussed BibServer in terms of where we could offer CKAN and other services to academic / research groups, as a stack of tools that would find beneficial to their work. There is a lots of talk just now about using dspace, e-prints or some as yet uninvented system for storing research data – JISC is funding some projects, and we will be having a discussion about this at Dev8D.
Sam, Primavera and Etienne hacked some code and Etienne also continued his work on the parsers.
Peter, Mark and I discussed BKN – the Bibliographic Knowledge Network – which is Jim Pitman’s project and the first BibServer… Follow-up happening next week.
Peter and I interviewed Mark Williamson, a post-doctoral researcher at the Chemistry Department, about using BibSoup (which he’d only looked at for a few moments before we put him on the spot – thanks Mark!). Mark also gave us a demonstration of using BibSoup for the blog which is a good ‘how to’ for people who haven’t used it before. Peter’s excellent summary of BibSoup goes as follows: “BibSoup is a philosophy rather than a technology – ie having local control over bibliographic data. The idea is to get people to share data together and to sign-up to supporting it in 5 years’ time”. We will follow up soon with links to the videos we made.

The main aim for the past few days was to get all the people working on the project together, with the aim of Getting Stuff Done: do some coding, boot up some dataset demos, plan more demos and integrations, plan further community engagement and coding over the next six months, integrating BibServer with other projects, etc… Amongst all the lively discussions, I think it’s safe to say the aim was achieved!

Many thanks to all involved – if anything from this week strikes you as particularly interesting, please do get involved.

Posted in BibServer, event, JISC OpenBib | Tagged jiscopenbib2 | 4 Comments

Wednesday 18th January – Open Biblio Sprint: Day 2

Posted on January 19, 2012 by Naomi Lillie

BibServer took precedence this morning, with Etienne, Ed and Mark continuing to develop the BibServer parsers… By March we want people to be able to download and run their own instance of the Server, or to provide a service whereby we can do it for them. We discussed various use cases that could be used to explain how BibServer is valuable in data collection – for example: a departmental administrator / researcher is required to provide a list of publications of those within his / her department using Symplectic / Web of Science / Endnote, and is to upload this information to the department’s website. BibSoup is ideal for this scenario as it allows different formats to be entered or produced, and the resulting collection can be easily searched and embedded in other web pages.

In the time we had been exploring the benefits of the BibServer, some e-mails had come through to the List with examples of collections. Starting with these, we identified a total of six people / groups who would drive open resources (BibSoup / BibServer) and whose data could be used as demonstrations (interpreted as Reading or Publication List format):

Malaria – Tom Olijhoek (Medline)
Sancoma – Gilles (Medline)
Karol Langner – personal libraries of people
UCC-PMR
Physics.cam.ac
Jim Pitman

It was agreed that these would be parsed through BibServer and used as examples of the functionality and importance of BibServer. Some, such as Jim’s probability web, may also benefit from dedicated BibServer instances.

Peter and I then interviewed members of the team, to record what was going on and talk about the project in general; this became short video blogs which will be available shortly.

Posted in BibServer, event, JISC OpenBib | Tagged jiscopenbib2 | 3 Comments

Tuesday 17th January – Open Biblio Sprint: Day 1

Posted on January 17, 2012 by Naomi Lillie

Today Etienne, Ed, Rufus, Mark, Peter and I met up to start the first sprint of the new year. We began by clarifying the purpose of the sprint, today’s agenda and the project overall. We stated our aims as follows: we are not trying to re-do what is already available online, we are not getting into the detail of normalisation or disambiguation within a centralised database, and we are not intending to alter the academic culture overnight; however, we are going to improve the BibJSON facility for wider use, we are trying to determine how we can get more small groups and individuals involved, and we are identifying compelling, essential and simple reasons for people to support the project at this early stage before the ultimate global benefits can be realised. With this in mind, we got cracking.

Etienne, the newest member of the team, began coding pretty much straight away – he and Ed started working on MARC / RIS parsers. Peter and I started a huge list of FAQs for the website – Peter asking in-depth questions such as ‘do we want to create a single BibJSON collection for all the world’s metadata?’, me going for slightly less detail with ‘what is metadata?’ – and Mark assisted where he was needed. Peter hacked some datasets to go into the parsers and Mark got some coding done too. There was good progress made and we are set up well for tomorrow to crunch some issues.

One problem to be revisited is in relation to BibJSON, in having copies of the same record within different collections. If an object (the record) is held within multiple collections, there are separate copies of that object which could cause problems – for example, if a record is copied into several collections and then a typo is found, it can be a mammoth task trying to track down all erroneous copies and correct them… This issue is likely to be solved by creating Master / Slave relationship between copies. Also with regards to BibJSON, Etienne suggested providing a flat HTML version of collections (in addition to the javascript option), for easy use in departmental web pages.

All set for day 2…

Posted in BibServer, event, JISC OpenBib | Tagged jiscopenbib2 | 1 Comment

BibServer Code Sprint January 2011

Posted on January 16, 2012 by Rufus Pollock

BibServer team will be having a code sprint this week.

When: Tuesday 17
Where: Cambridge, UK (and online)

More details to follow.

Posted in BibServer, JISC OpenBib, News | Tagged jiscopenbib2 | 1 Comment

Sweden Ends Negotiations with OCLC

Posted on January 12, 2012 by Adrian Pohl

The following guest post is by Maria Kadesjö who works at the Libris-department at the National Library of Sweden.

The national library of Sweden has ended negotiations with OCLC on participation in WorldCat, as the parties could not come to an agreement. The negotiations go back to 2006 and the key obstacle for the national library has been the record use policy. Some time into the negotiations OCLC presented certain conditions for how records taken from WorldCat for cataloguing were to be used in Libris.

A No to WorldCat Rights and Responsibilites

The question has its base in the “WorldCat Rights and Responsibilities for the OCLC Cooperative” where you as an OCLC member have to accept certain conditions and the aim is to support the ongoing and long-term viability and utility of WorldCat (and its services). The National Library cannot accept the conditions as they are today since WorldCat is not the only arena in which The National Library wants and needs to be active. Accepting the conditions would mean that we would forever have to relate to OCLC’s policy.

Libris an open database

Libris is the Swedish union catalogue with (with some 170 libraries, primarily academic libraries) and is built quite similar to WorldCat but on a smaller scale. Member libraries catalogue their records in Libris and the records are then exported to their local library systems. The Libris-cooperative is built on voluntarily participation. Any Libris-library should be able to, whenever they want, take out all their bibliographic records from Libris and use them in another library system. The National library makes no claim to the records and do not control how the libraries chose to use their bibliographic records.

The National Library has taken the decision to release the national bibliography and authority data as open data. The reason for this is to acknowledge the importance of open data and the importance of libraries’ control of their data when it comes to the long term sustainability and competition of the services needed by libraries and their users.

In the Agreement on Participation in the Libris joint catalogue signed by the National Library and cataloguing libraries, Point 3.3 specifies that “the content in Libris is owned by the National Library and is freely accessible in accordance with the precepts and methods reported by the National Library, both for Participating Libraries and for external partners.” This paragraph is needed so that the National library can sign agreements for Libris with other partners (OCLC for example) but also so that the National Library can abstain from claims of ownership of bibliographic records taken from Libris. Libris can therefore be an open database, both for the libraries that use Libris for cataloguing and for others.

Not consistent with Libris principles

The consequences of signing with OCLC would be that the National Library would have to supervise how records originating in WorldCat were used. And a library that took a bibliographic record from WorldCat for cataloguing in Libris and exported it to its own system would have to accept OCLC´s term of use.

A library that wished to leave Libris would not obviously be able to do this, since it is not self-evident that the bibliographic records could be integrated into other systems. This would be an infringement in the voluntarily participation that characterizes Libris. In practice, the National Library has no mandate that restricts the freedom of action of Libris’ libraries in this way, since the National Library has no possibility of influencing how the Libris libraries themselves choose to use their catalogues.

Posted in guest post, vendors | Leave a comment