Minutes: 19th Virtual Meeting of the OKFN Openbiblio Group

Date: March, 6th 2011, 16:00 GMT

Channels: Meeting was held via Skype and Etherpad


  • Adrian Pohl
  • Sebastian Nordhoff
  • Jim Pitman


Action points from the last meeting

  • Adrian has posted to the KIM-DINI-LLD list regarding the national bibliography and a concersion to BibJSON. There are no answers yet.
  • Until now Mark & Adrian haven’t looked at the data themselves and whether/how it can be converted to BibJSON.

ACTION: Adrian will personally ask people from the German National Library at BibCamp.

Langdoc dataset

Sebastian gave us some information about the bibliographic dataset for the world’s lesser known languages recently published at http://glottolog.livingsources.org/. (See also the thread on the mailing list.)


  • The dataset contains 180k references with 120K records tagged by language.
  • Sebastian sees a possibility (and need?) for crowdsourcing to add missing data elements for 40K records.
  • The data can be downloaded under http://glottolog.livingsources.org/meta/downloads.
  • In addition to bibliographic data the website contains a “comprehensive catalogue of the world’s languages, language families and dialects (langoids)” which can be searched and browsed starting here.
  • Example bibliographic record.
  • Existing problems with the data include
    • several ad hoc BibText properties
    • 73 different data fields
    • Sometimes semantics of a property aren’t known, e.g. document_type={B}
    • These problems are due to heterogeneous data sources for this dataset
    • RDF for references still in need of improvement


  • There’s machine-learning code for automatically recognizing the language of a work involved
  • written in Python
  • Isn’t open yet
  • The code is created by Harald Hammarström.
  • NLTK, Natural Language Toolkit: http://www.nltk.org/

licensing issues

  • How many parts of the data set make problems? – “Africa part”, “Australia part”
  • What share of the data is problematic? – Approx. one quarter.

What’s Sebastian up to with the data set?

  • SN is happy to provide data and help other people start with it.
  • SN can probably convert existing data in BibJSON
  • SN will host a SPARQL endpoint
  • SN will not host a bibserver
  • SN will not develop bibserver

ACTION: Sebastian will provide a post for openbiblio.net when the data set is officially released (sometimes in March).


A short discussion about provenance – in regard to BibJSON – arose.

ACTION: Ask members of these groups to provide a short post about it on openbiblio.net.

Resource and tasks of openbiblio activities at OKF

  • Jim had some questions about governance and allocation of OKF resources to the working group’s activities
    • How to organize WGOBD to engage contributors to creation/maintenance of various listings relevant to OBD?
    • How to keep the enquiries to potential open data providers going?

ACTION: Write down core resources and tasks of the openbiblio group. (Adrian)

This entry was posted in minutes, OKFN Openbiblio, Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *