Open Bibliography and Open Bibliographic Data » openbiblio

JSON-LD / BibJSON

Mark MacGillivray — Tue, 21 Feb 2012 18:00:02 +0000

There have been requests on our mailing list recently to consider the various options for supporting validation of BibJSON and for supporting namespacing. These two options require some further consideration.

Validation

Efforts so far around BibJSON have focussed on building a useful JSON representation of bibliographic metadata, with some typical key/value pairs that are common in or extended from bibtex. This started off simply, but we have seen increasing complexity to accommodate further functionality requests. There was some work on a JSON schema for validation against, but given the aim of being as flexible as possible, and with very few required keys, the function of validation of a BibJSON document would have very little effect.

Validating a document as properly formatted JSON is, of course, a good idea; but there are plenty ways to do this already – just try to parse it with any number of libraries for your programming language of choice.

But to reach the stage of actually supporting validation against a pre-defined schema, we must pre-define a schema – and that means becoming inflexible (or doing such little validation as for it to be essentially pointless).

An alternative to validation against a schema would be adoption of namespaces.

Namespaces

We do already have a namespace concept in BibJSON – it is just a key in the metadata, under which can be listed namespaces and a suitable prefix for them. However, this model is not widely known (because we made it up). To overcome this, we should adopt the JSON-LD method of using @context parameters. This way, it would be possible to specify the namespace in which your record keys are defined, and to share namespace information with other people / machines.

What is the point

Using namespaces, having schema, only become sensible when there is a concerted effort to share data with others. For internal use, they could be valuable for consistency, but the code we write internally adheres by definition to our own level of consistency anyway.

Therefore, it is not a function of BibJSON to perform validation – BibJSON is just JSON. Rather, it is the function of a community to make agreements and to conform to those agreements as required.

Where such a function must be supported, it should be done via mechanisms already available and maintained for that purpose – there is no point attempting to maintain our own; it is not our key strength or goal.

Recommendation

Change the BibJSON use of namespaces to conform to the method specified in JSON-LD, and that wherever consistency is required, agreement to share data via JSON and within a particular @context should be reached.

The fundamental basic keys in BibJSON – the default context – should remain as they are, and should not require contextualisation.

If contextualisation of the fundamental keys of BibJSON is required, then those keys should be contextualised into a schema by whomsoever has such a requirement.

Ramifications

drop the “namespace” key in BibJSON
continue using BibJSON as normal, but:
reference JSON-LD for use of @context and other more complex LD functions as required
wherever validation is required, perform it based on the use of namespaced keys (beyond scope of bibjson)

References

Finnish Turku City Library and the Vaski consortia now Open Data with 1.8M MARC-records

Mace Ojala — Thu, 13 Oct 2011 18:50:34 +0000

Let's open up our metadata containers

I’m happy to announce that our Vaski-consortia of public libraries serving total 300 000 citizens in Turku and the a dozen surrounding municipalities in western Finland, have recently published all of our 1.8 million bibliographical records in the open, as a big pile of data (see on The Data Hub).

Each of the records describes a book, recording, movie, song or other publication in our library catalogue. Titles, authors, publishing details, library classifications, subject headings, identifiers and so on systematically saved in MARC -format, the international, structured library metadata standard since the late 1960s.

Unless I’ve missed something, ours is the third large scale Open Data -publication from the libraries of Finland. The first one was the 670 000 bibliographical records of HelMet-consortia (see on The Data Hub), an another consortia of public libraries around the capital Helsinki. This first publication was organized and initiated in 2010 by Kirjastot.fi Labs, a project seeking for more agile, innovative library concepts. The second important Open Data publication was our national generic theseurus Yleinen suomalainen asiasanasto YSA which is also available as a cool semantic ontology.

Joining this group of Open Data publications was natural for our Vaski-consortia, because we are moving our data from one place to another anyway; we are in the middle of the process of converting from our national FinMARC -flavour to the international MARC21 -flavour of MARC, swapping our library system from Axiell PallasPro to Axiell Aurora, plus implementing a new, ambitious search and discovery interface for all the Finnish libraries, archives and museums (yes, it’s busy times here and we love the taste of a little danger). All this means we are extracting, injecting, converting, mangling, breaking, fixing, disassembling and reassembling all of our data. So, we asked ourselves, why not publish all of our bibliographical data on the net while we are on it?

The process of going Open Data has been quite seamless for us. On my initiative the core concept of Open Data was explained to the consortia’s board. As there were no objections or further questions, we contacted our vendor BTJ who immidiately were supporting the idea. From there on it was basically just about some formalities with BTJ, consulting international colleagues regarding licensing, writing a little press-release, organizing a few hundred megabytes of storage space on the internet. And trying to make sure the Open Data -move didn’t get buried under other, more practical things during the summertime.

For our data license we have chosen the liberal Creative Commons-0 license (CC0), because we try to have as little obstructions to our data as possible. However we have agreed on a 6 month embarko with BTJ, a company who is doing most of the cataloguing for the Finnish public libraries. We believe that it is a good compromise to prefer publishing data that is slightly outdated, than try to make the realm of immaterial property rights any more unclear than it already is.

Traditional library metadata at Turku main library

We seriously cannot anticipate what our Open Data -publication will lead to. Perhaps it will lead to absolutely nothing at all. I believe most organizations opening up their data face this uncertainty. However what we do know for sure is, that all of the catalogue records we have carefully crafted, acquired and collected, are seriously underutilized if they are only used for one particular purpose: finding and locating items in the library collections.

For such a valuable assett as our bibliographical metadata, I feel this is not enough. By removing obstacles for accessing our raw data, we open up new possibilities for ourselves, for our colleagues (understood widely), and to anybody interested.

Mace Ojala, project designer
Turku City Library/Vaski-consortia; National Digital Library of Finland, Cycling for libraries, etc.
http://xmacex.wordpress.com, @xmacex, Facebook etc.

Openbiblio at #elag2011 and #lodlam

Adrian Pohl — Wed, 01 Jun 2011 15:18:41 +0000

I wrote a post over at the blog for the LOD-LAM (Linked Open Data in Libraries, Archives and Museums) summit. It’s mainly a summary of the ELAG 2011 from an openbiblio viewpoint. See http://lod-lam.net/summit/2011/06/01/from-elag2011-to-lodlam for the post.

Also, the German Zukunftswerkstatt published an interview podcast regarding Open Bibliographic Data. Julia Bergman interviewed Patrick Danowski, Kai Eckert and me at the German barcamp for librarians and other hackers BibCamp. Hopefully, a text version of this interview will also be published on the web soon.