Day 1 of the March Sprint

Agendas are funny things; you have an idea of what you want to do, you write a few bullet-points to focus it a little and you presume things will naturally lead on from one thing to the next… Well, not this week. Barely had we settled down to the intros when the agenda was out the window!

As well as the Usual Suspects (Mark, me) and the Collaborators (Sam, Laura) we welcomed Thomas Krichel, an expert in scholarly communication over from the States, and were joined by two additional OKFNers, Jilly Matthews and Will Waites, who are based in Edinburgh and popped by to catch up on the project’s recent developments.

To begin with, all of us set our minds to the collaborative opportunities with CKAN as Jilly explained the project and the difference between thedatahub.org and CKAN (the former is a publicly available instance of the technology of the latter, which drives this and other instances). Then we split up into groups:

  • Thomas and Mark explored connecting BibSoup data with AuthorClaim and refined some ideas for the future of BibJSON, with Will (who was involved with previous iterations of the Open Biblio project/s) and Jilly contributing to discussions around simple / complex JSON following on from Mark’s post;
  • Mahendra and I finalised the details of tomorrow’s Meet-up;
  • Laura and Sam ducked in to various conversations, suggesting improvements in technology and running events as key phrases caught their ears… Sam was looking at Open Biblio’s overlap with OpenGLAM and Laura was advising on tomorrow’s event, having arranged several Cambridge Meet-ups before.

The plan for tomorrow is for Mahendra, Laura and I to plan the June Hackathon and for Mark to get some good coding done with Etiennne… but we’ll see how the agenda shifts!

Posted in event, JISC OpenBib, OKFN Openbiblio | Tagged , , , , , , | Leave a comment

Minutes: 19th Virtual Meeting of the OKFN Openbiblio Group

Date: March, 6th 2011, 16:00 GMT

Channels: Meeting was held via Skype and Etherpad

Participants

  • Adrian Pohl
  • Sebastian Nordhoff
  • Jim Pitman

Agenda

Action points from the last meeting

  • Adrian has posted to the KIM-DINI-LLD list regarding the national bibliography and a concersion to BibJSON. There are no answers yet.
  • Until now Mark & Adrian haven’t looked at the data themselves and whether/how it can be converted to BibJSON.

ACTION: Adrian will personally ask people from the German National Library at BibCamp.

Langdoc dataset

Sebastian gave us some information about the bibliographic dataset for the world’s lesser known languages recently published at http://glottolog.livingsources.org/. (See also the thread on the mailing list.)

Data

  • The dataset contains 180k references with 120K records tagged by language.
  • Sebastian sees a possibility (and need?) for crowdsourcing to add missing data elements for 40K records.
  • The data can be downloaded under http://glottolog.livingsources.org/meta/downloads.
  • In addition to bibliographic data the website contains a “comprehensive catalogue of the world’s languages, language families and dialects (langoids)” which can be searched and browsed starting here.
  • Example bibliographic record.
  • Existing problems with the data include
    • several ad hoc BibText properties
    • 73 different data fields
    • Sometimes semantics of a property aren’t known, e.g. document_type={B}
    • These problems are due to heterogeneous data sources for this dataset
    • RDF for references still in need of improvement

Code

  • There’s machine-learning code for automatically recognizing the language of a work involved
  • written in Python
  • Isn’t open yet
  • The code is created by Harald Hammarström.
  • NLTK, Natural Language Toolkit: http://www.nltk.org/

licensing issues

  • How many parts of the data set make problems? – “Africa part”, “Australia part”
  • What share of the data is problematic? – Approx. one quarter.

What’s Sebastian up to with the data set?

  • SN is happy to provide data and help other people start with it.
  • SN can probably convert existing data in BibJSON
  • SN will host a SPARQL endpoint
  • SN will not host a bibserver
  • SN will not develop bibserver

ACTION: Sebastian will provide a post for openbiblio.net when the data set is officially released (sometimes in March).

Provenance

A short discussion about provenance – in regard to BibJSON – arose.

ACTION: Ask members of these groups to provide a short post about it on openbiblio.net.

Resource and tasks of openbiblio activities at OKF

  • Jim had some questions about governance and allocation of OKF resources to the working group’s activities
    • How to organize WGOBD to engage contributors to creation/maintenance of various listings relevant to OBD?
    • How to keep the enquiries to potential open data providers going?

ACTION: Write down core resources and tasks of the openbiblio group. (Adrian)

Posted in minutes, OKFN Openbiblio, Uncategorized | Leave a comment

March Sprint and Meet-up

There will be a coding and planning sprint for the project team in Edinburgh on Monday 12th and Tuesday 13th March, with tying-up of loose ends on Wednesday 14th for those still around.

Following on from the productivity of January’s sprint, we aim to update project documentation, code and refine development, explore integration with other projects, plan for the remaining three months including demonstrations and user engagement, etc.

We will be joined by representatives of other projects including Textus, the School of Open Data and DevCSI.

If anyone is interested in seeing what we’re up to, or talking open data / knowledge in general, come along on the Tuesday evening as we have arranged a Meet-up with others from OKFN and DevCSI, and all are welcome – more details here. This promises to be a great opportunity for some Edinburgh-based folk (and anyone willing to travel!) to get together to discuss ideas, projects and generally set the world to rights over a brew.

For more information contact naomi.lillie [@] okfn.org.

Twitter: #OpenDataEDB

Posted in event, JISC OpenBib, OKFN Openbiblio | Tagged , , , , | Leave a comment

Linked Open Data as explained by Europeana

Antoine Isaac recently sent an e-mail around the List to let us know that Europeana has published its first dataset, comprising 2.4 million objects, under CC0. Furthermore, the new Data Exchange Agreement, which data suppliers are required to sign in order to publish on Europeana (and already signed by national libraries, national museums and content providers for entire countries), comes into effect on 1 July 2012, after which all metadata in Europeana will be available as Open Data to the Public Domain!

This is brilliant news in itself, but what I found particularly enchanting was the animated video that Europeana created in support of this announcement:

Linked Open Data from europeana on Vimeo.

As mentioned before, I am not of a technical background, and sometimes terms and explanations are difficult for me to grasp; however – I understand this video! I think it’s a brilliant explanation of the detail of Linked Open Data (LOD): how metadata works together, why it’s important that it’s open, how the more open data is available the more can be done with it, etc. This provides great clarity on what it is we’re seeking to do, for those of us who can’t tell a gig from a meg. The Open Biblio readership is generally more savvy with the nitty-gritty workings of LOD than I – if not in fact doing this sort of stuff already – but how can you not love the simplicity of this video?! It’s engaging, interesting, mostly jargon-free and less than four minutes long… So, if you’re an advocate of LOD without really understanding the processes behind the philosophy, get yourself a cuppa, settle down to watch this, and be informed.

Thanks Europeana, keep up the good work!

Europeana’s press release provides more information about the above dataset release and video.

Posted in Data, LOD-LAM, News | Tagged , | 1 Comment

JSON-LD / BibJSON

There have been requests on our mailing list recently to consider the various options for supporting validation of BibJSON and for supporting namespacing. These two options require some further consideration.

Validation

Efforts so far around BibJSON have focussed on building a useful JSON representation of bibliographic metadata, with some typical key/value pairs that are common in or extended from bibtex. This started off simply, but we have seen increasing complexity to accommodate further functionality requests. There was some work on a JSON schema for validation against, but given the aim of being as flexible as possible, and with very few required keys, the function of validation of a BibJSON document would have very little effect.

Validating a document as properly formatted JSON is, of course, a good idea; but there are plenty ways to do this already – just try to parse it with any number of libraries for your programming language of choice.

But to reach the stage of actually supporting validation against a pre-defined schema, we must pre-define a schema – and that means becoming inflexible (or doing such little validation as for it to be essentially pointless).

An alternative to validation against a schema would be adoption of namespaces.

Namespaces

We do already have a namespace concept in BibJSON – it is just a key in the metadata, under which can be listed namespaces and a suitable prefix for them. However, this model is not widely known (because we made it up). To overcome this, we should adopt the JSON-LD method of using @context parameters. This way, it would be possible to specify the namespace in which your record keys are defined, and to share namespace information with other people / machines.

What is the point

Using namespaces, having schema, only become sensible when there is a concerted effort to share data with others. For internal use, they could be valuable for consistency, but the code we write internally adheres by definition to our own level of consistency anyway.

Therefore, it is not a function of BibJSON to perform validation – BibJSON is just JSON. Rather, it is the function of a community to make agreements and to conform to those agreements as required.

Where such a function must be supported, it should be done via mechanisms already available and maintained for that purpose – there is no point attempting to maintain our own; it is not our key strength or goal.

Recommendation

Change the BibJSON use of namespaces to conform to the method specified in JSON-LD, and that wherever consistency is required, agreement to share data via JSON and within a particular @context should be reached.

The fundamental basic keys in BibJSON – the default context – should remain as they are, and should not require contextualisation.

If contextualisation of the fundamental keys of BibJSON is required, then those keys should be contextualised into a schema by whomsoever has such a requirement.

Ramifications

  • drop the “namespace” key in BibJSON
  • continue using BibJSON as normal, but:
  • reference JSON-LD for use of @context and other more complex LD functions as required
  • wherever validation is required, perform it based on the use of namespaced keys (beyond scope of bibjson)

References

Posted in BibServer, JISC OpenBib, OKFN Openbiblio | Tagged , , , , , , , | 1 Comment

BibSoup beta: released

BibSoup is here! And it’s going to revolutionise how you work with bibliographic metadata.

bibsoup_screenshot

Peter has been blogging for a while about BibSoup (see here for the basics and here for how to use it) and we’ve mentioned it in passing on this blog (for example this sprint post and explanation of Bib- terms)… But now it is time for the ‘official’ launch. Hurrah!

So, how to get involved?

Setting up a Bibserver and Faceted Browsing (Mark MacGillivray) from Bibsoup Project on Vimeo.

We already have parsers available to get your data directly via either BibTex or RIS (or from BibJSON…), which means you can get data in from most major bibliographic tools already; you can even use the parsers programmatically if you like, at http://bibsoup.net/parse (although that functionality is in the process of improvement). We are open to suggestions for further parsers, and would be happy to guide anyone through making one.

(By the way, we are assuming you will have seen previous posts on this site and will therefore know what we’re talking about, but if not then please see this OKFN blog post for a fuller explanation of what BibSoup is for, why it’s great and what this overall project is all about).

So, what do you think? Let us know. There will be bugs, or areas we could improve, so please pass suggestions our way. Feature requests can be submitted via our issue tracker, and we batch those up into milestones to work towards the next release. Our current focus is on improving parser functionality and also on enabling editing.

We hope you like it and find all this useful… do add your collections so we can share them with the rest of the world, too. If you would like your own BibServer, go ahead and download the code, or contact us for help / support options.

Posted in BibServer, Data, JISC OpenBib, News | Tagged , , , , , | 2 Comments

Communication processes – for the record!

This follows discussion that began at the meeting on 1st February, and reasserts existing processes.

Any proposal for discussion is published ahead of the meeting at which it is to be raised, with an email inviting everyone on the openbiblio-dev list to the call and linking to the etherpad (which contains or links to further details). This is to ensure materials are available in advance of calls.

Everyone is free to suggest direction, which is agreed by consensus. Technical lead is Mark, and Community lead is Naomi.

All discussion should be carried out openly, on the available mailing lists. Agreement before publication of blog posts or pages is not required – any early discussions off-list should be posted on the appropriate mailing list for discussion, then posted on the blog if they come to fruition. Strategic and management proposals should make it clear that they are for discussion until the team has covered the topic at the weekly catch-up (Wednesdays at 16.00 GMT).

Posted in JISC OpenBib, minutes | Tagged , | Leave a comment

Comparing existing bib tools

Update to this post: turns out there was a page, just not one I was aware of – please see http://en.wikipedia.org/wiki/Comparison_of_reference_management_software. I have linked to this from http://wiki.okfn.org/Projects/jiscopenbib2. Isn’t it handy when people have already done the job for us…

Recently, a discussion on the Working Group List raised the subject of existing technologies that store and share reading / publication lists, and how BibSoup / BibServer compares to them.

Tom Morris said:

Perhaps it would be illustrative to compare and contrast with other existing widely known services and tools such as Zotero, Mendeley, CiteUlike, and the venerable emacs/Bibtex/LaTex. What is better, worse, or just different? Which sets of things are alternatives to each other and which complement each other? What are the things which make BibSoup/BibServer unique?

Of course, if this is already laid out in detail somewhere on a web page, just point me there.

There wasn’t, until now, so please refer to this wiki page http://wiki.okfn.org/Projects/jiscopenbib2/managementtools set up for this purpose and start comparing!

I have used Thad Guidry’s notes on Mendeley, as well as the first line of the Wikipedia entry, to populate that example. Please do edit and add to this page – we want to avoid a debate on which is better than which, so please keep your opinions in check, but hopefully this will be a good opportunity to get a sense of what is in use and how they compare with one another and BibSoup.

Posted in BibServer, Data, JISC OpenBib, OKFN Openbiblio | Tagged , , , | Leave a comment

Open Bibliography at the start of 2012

Adrian’s post about the German National Library prompted me to note down a few other exciting developments over the last month or so. Christmas and the holiday season may be perceived as a time for winding down, but not for these people!

  • The OCLC released Faceted Application of Subject Terminology (FAST) as Linked Data under an open licence. Their press release gives more information (please note, at present the full database is limited to API access, but OCLC will be making FAST available as a dump under the ODC-BY license in the near future).
  • NISO is working on a standard for the coding of citations in ebooks, full details here.
  • Our own Principles on Open Bibliographic Data have been translated into Hungarian (by Dudás Anikó) and Polish (by Karol Langner) so we hope this is the beginning of improved communication in both Hungary and Poland and with native speakers all over the world.
  • OKFN has been working on various ways of improving the overall field of Open Data, including the Open Metadata Handbook which is designed to help make it easier to harvest and process bibliographic metadata from a variety of sources, for people with little or no technical background, explaining the metadata standards as used by different institutions.
  • And of course, Peter Murray-Rust has been busy on his blog, exploring issues of US bill HR3699, how to present your list using BibSoup and generally promoting the world of open data.
  • Finally – watch this space for an announcement about BibSoup, launching in beta later this week!

Many thanks to the terrific group of people who keep us updated with this information. If you would like to be among the first to hear about developments like these, get involved with the Open Bibliographic Working Group and let us know what you’re up to.

Posted in Data, JISC OpenBib, News, OKFN Openbiblio | Tagged , , , , , , , | Leave a comment

Linked Data at the Biblioteca Nacional de España

The following guest post is from the National Library of Spain and the Ontology Engineering Group (Technical University of Madrid (UPM)).

Datos.bne.es is an initiative of the Biblioteca Nacional de España (BNE) whose aim is to enrich the Semantic Web with library data.

This initiative is part of the project “Linked Data at the BNE”, supported by the BNE in cooperation with the Ontology Engineering Group (OEG) at the Universidad Politécnica de Madrid (UPM). The first meeting took place in September 2010, whereas the collaboration agreement was signed in October 2010. The first set of data was transformed and linked in April 2011, but a more significant set of data was done in December 2011.

The initiative was presented in the auditorium of the BNE on 14th December 2011 by Asunción Gómez-Pérez, Professor at the UPM and Daniel Vila-Suero, Project Lead (OEG-UPM), and by Ricardo Santos, Chief of Authorities, and Ana Manchado Mangas, Chief of Bibliographic Projects, both from the BNE. The attendant audience enjoyed the invaluable participation of Gordon Dunsire, Chair of the IFLA Namespace Group.

The concept of Linked Data was first introduced by Tim Berners-Lee in the context of the Semantic Web. It refers to the method of publishing and linking structured data on the Web. Hence, the project “Linked Data at the BNE” involves the transformation of BNE bibliographic and authority catalogues into RDF as well as their publication and linkage by means of IFLA-backed ontologies and vocabularies, with the aim of making data available in the so-called cloud of “Linked Open Data”. This project focuses on connecting the published data to other data sets in the cloud, such as VIAF (Virtual International Authority File) or DBpedia. With this initiative, the BNE takes the challenge of publishing bibliographic and authority data in RDF, following the Linked Data Principles and under the CC0 (Creative Commons Public Domain Dedication) open license. Thereby, Spain joins the initiatives that national libraries from countries such as the United Kingdom and Germany have recently launched.

Vocabularies and models

IFLA-backed ontologies and models, widely agreed upon by the library community, have been used to represent the resources in RDF. Datos.bne.es is one of the first international initiatives to thoroughly embrace the models developed by IFLA, such as the FR models FRBR (Functional Requirements for Bibliographic Records), FRAD (Functional Requirements for Authority Data), FRSAD (Functional Requirements for Subject Authority Data), and ISBD (International Standard for Bibliographic Description).

FRBR has been used as a reference model and as a data model because it provides a comprehensive and organized description of the bibliographic universe, allowing the gathering of useful data and navigation. Entities, relationships and properties have been written in RDF using the RDF vocabularies taken from IFLA; thus FR ontologies have been used to describe Persons, Corporate Bodies, Works and Expressions, and ISBD properties for Manifestations. All these vocabularies are now available at Open Metadata Registry (OMR), with the status of published. Additionally, in cooperation with IFLA, labels have been translated to Spanish. MARC21 bibliographic and authority files have been tested and mapped to the classes and properties at OMR. The following mappings were carried out:

  • A mapping to determine, given a field tag and a certain subfield combination, to which FRBR entity it is related (Person, Corporate Body, Work, Expression). This mapping was applied to authority files.
  • A mapping to establish relationships between entities.
  • A mapping to determine, given a field/subfield combination, to which property it can be mapped. Authority files were mapped to FR vocabularies, whereas bibliographic files were mapped to ISBD vocabulary. A number of properties from other vocabularies were also used.

The aforementioned mappings will be soon available to the library community and thus the BNE would like to contribute to the discussion of mapping MARC records to RDF; in addition, other libraries willing to transform their MARC records into RDF will be able to reuse such mappings.

Almost 7 million records transformed under an open license

Approximately 2.4 million bibliographic records have been transformed into RDF. They are modern and ancient monographies, sound-recordings and musical scores. Besides, 4 million authority records of persons, corporate names, uniform titles and subjects have been transformed. All of them belong to the bibliographic and authority catalogues of the BNE stored in MARC 21 format. As for the data transformation, the MARImbA (MARc mappIngs and rdf generAtor) tool has been developed and used. MARiMbA is a tool for librarians, whose goal is to support the entire process of generating RDF from MARC21 records. This tool allows using any vocabulary (in this case ISBD and FR family) and simplifies the process of assigning correspondences between RDFS/OWL vocabularies and MARC 21. As a result of this process, about 58 million triples have been generated in Spanish. These triples are high quality data with an important cultural value that substantially increases the presence of the Spanish language in the data cloud.

Once the data were described with IFLA models, and the bibliographic and authorities catalogues were generated in RDF, the following step was to connect these data with other existing knowledge RDF databases included in the Linking Open Data initiative. Thus, the data of the BNE are now linked or connected with data from other international data source through VIAF, the Virtual International Authority File.

The type of licence applied to the data is CC0 (Creative Commons Public Domain Dedication), a completely open licence aimed at promoting data reuse. With this project, the BNE adheres to the Spanish Public Sector’s Commitment to openness and data reuse, as established in the Royal Decree 1495/ 2011 of 24 October, (Real Decreto 1495/2011, de 24 de octubre) on reusing information in the public sector, and also acknowledges the proposals of the CENL (Conference of European National Librarians).

Future steps

In the short term, the next steps to carry out include

  • Migration of a larger set of catalogue records.
  • Improvement of the quality and granularity of both the transformed entities and the relationships between them.
  • Establishment of new links to other interesting datasets.
  • Development of a human-friendly visualization tool.
  • SKOSification of subject headings.

Team

From BNE: Ana Manchado, Mar Hernández Agustí, Fernando Monzón, Pilar Tejero López, Ana Manero García, Marina Jiménez Piano, Ricardo Santos Muñoz and Elena Escolano. From UPM: Asunción Gómez-Pérez, Elena Montiel-Ponsoda, Boris Villazón-Terrazas and Daniel Vila-Suero.

Posted in Data, guest post, LOD-LAM, Semantic Web | Tagged | 2 Comments