Open Bibliography and Open Bibliographic Data » event

Minutes: 25th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

Naomi Lillie — Wed, 05 Sep 2012 17:21:28 +0000

Date: September, 4th 2012, 15:00 GMT

Channels: Meeting was held via Skype and Etherpad

Participants

Peter Murray-Rust
Naomi Lillie

NB Karen Coyle apologies due to attendance at DublinCore conference

Agenda

As there was just PeterMR and me attending this call, we abandoned any formal agenda and had a very pleasant chat discussing PeterMR’s engagements and the upcoming OKFestival.

PeterMR has been presenting various Bibliographic tools (including BibSoup) at a number of events lately, including VIVO12, and will do so at the upcoming Digital Science 2012 in Oxford. We discussed support for the existing tools we have in the Open Knowledge Foundation, in terms of person-resource and funding, and the importance of BiBServer as an underlying tool for much of the work to be done in and around Open Bibliography and Access.

OKFest is less than 2 weeks away now and there is so much potential here for collaboration and idea generation… We agreed we are very excited and looking forward to meeting the pillars of Open society as well as those brand-new to this world which will only grow in influence and importance. Now is the time to embrace Open!

There were no particular actions, but it was helpful to consider how we can make a difference on the world of bibliography, for OKFN and GLAM institutions in general (ie galleries, libraries, archives and museums).

To join the Open Bibliography community sign up here – you may also be interested in the Open Access Working Group which is closely aligned in its outlook and aims.

Community Discussions 3

Naomi Lillie — Fri, 13 Jul 2012 12:41:46 +0000

It has been a couple of months since the round-up on Community Discussions 2 and we have been busy! BiblioHack was a highlight for me, and last week included a meeting of many OKFN types – here’s a picture taken by Lucy Chambers for @OKFN of some team members:

The Discussion List has been busy too:

Further to David Weinbergers’s pointer that Harvard released 12 million bibliographic records with a CC0 licence, Rufus Pollock created a collection on the DataHub and added it to the Biblio section for easy of reference
Rufus also noticed that OCLC had issued their major release of VIAF, meaning that millions of author records are now available as Open Data (under Open Data Commons Attribution license), and updated the DataHub dataset to reflect this
Peter Murray-Rust noted that Nature has made its metadata Open CC0
David Shotton promoted the International Workshop on Contributorship and Scholarly Attribution at Harvard, and prepared a handy guide for attribution of submissions
Adrian Pohl circulated a call for participation for the SWIB12 “Semantic Web in Bibliotheken” (Semantic Web in Libraries) Conference in Cologne, 26-28 November this year, and hosted the monthly Working Group call
Lars Aronsson looked at multivolume works, asking whether the OpenLibrary can create and connect records for each volume. HathiTrust and Gallica were suggested as potential tools in collating volumes, and the barcode (containing information populated by the source library) was noted as being invaluable in processing these
Sam Leon explained that TEXTUS would be integrating BibSever facet view and encouraged people to have a look at the work so far; Tom Oinn highlighted the collaboration between Enriched BibJSON and TEXTUS, and explained that he would be adding a ‘TEXTUS’ field to BibJSON for this purpose
Sam also circulated two tools for people to test, Pundit and Korbo, which have been developed out of Digitised Manuscripts to Europeana (DM2E)
Jenny Molloy promoted the Open Science Hackday which took place last week – see below for a snap-shot courtesy of @OKFN:

In related news, Peter Murray-Rust is continuing to advocate the cause of open data – do have a read of the latest posts on his blog to see how he’s getting on.

The Open Biblio community continues to be invaluable to the Open GLAM, Heritage, Access and other groups too and I would encourage those interested in such discussions to join up at the OKFN Lists page.

Bringing the Open German National Bibliography to a BibServer

Adrian Pohl — Mon, 18 Jun 2012 08:55:39 +0000

This blog post is written by Etienne Posthumus and Adrian Pohl.

We are happy that the German National Library recently released the German National Bibliography as Linked Open Data, see (announcement). At the #bibliohack this week we worked on getting the data into a BibServer instance. Here, we want to share our experiences in trying to re-use this dataset.

Parsing large turtle files: problem and solution

The raw data file is 1.1GB in a compressed format – unzipped it is a 6.8 GB turtle file.
Working with this file is unwieldy, it can not be read into memory or converted with tools like rapper (which only works for turtle files up to 2 GB, see this mail thread). Thus, it would be nice if the German National Library could either provide one big N-Triples file that is better for streaming processing or provide a number of smaller turtle files.

Our solution to get the file into a workable form is to make a small Python script that is Turtle syntax aware, to split the file into smaller pieces. You can’t use the standard UNIX split command, as each snippet of the split file also needs the prefix information at the top and we do not want to split an entry in the middle, losing triples.

See a sample converted N-Triples file from a turtle snippet.

Converting the N-Triples to BibJSON

After this, we started working on parsing an example N-Triples file to convert the data to BibJSON. We haven’t gotten that far, though. See https://gist.github.com/2928984#file_ntriple2bibjson.py for the resulting code (work in progress).

Problems

We noted problems with some properties that we like to document here as feedback for the German National Library.

Heterogeneous use of dcterms:extent

The dcterms:extent property is used in many different ways, thus we are considering to omit it in the conversion to BibJSON. Some example values of this property: “Mikrofiches”, “21 cm”, “CD-ROMs”, “Videokassetten”, “XVII, 330 S.”. Probably it would be the more appropriate choice to use dcterms:format for most of these and to limit the use of dcterms:extent to pagination information and duration.

URIs that don’t resolve

We stumbled over some URIs that don’t resolve, whether you order RDF or HTML in the accept header. Examples: http://d-nb.info/019673442, http://d-nb.info/019675585, http://d-nb.info/011077166

Also, DDC URIs that are connected to a resource with dcters:subject don’t resolve, e.g. http://d-nb.info/ddc-sg/070.

Footnote

At a previous BibServer hackday, we loaded the Britsh National Bibliography data into BibServer. This was a similar problem, but as the data was in RDF/XML we could directly use the built-in Python XML streaming parser to convert the RDF data into BibJSON.
See: https://gist.github.com/1731588 for the source.

BiblioHack: Day 2, part 2

Naomi Lillie — Thu, 14 Jun 2012 15:00:10 +0000

Pens down! Or, rather, key-strokes cease!

BiblioHack has drawn to a close and the results of two days’ hard labour are in:

A Bibliographic Toolkit

Utilising BibServer

Peter Murray-Rust reported back on what was planned, what was done, and the overlap between the two! The priority was cleaning up the process for setting up BibServers and getting them running on different architectures. (PubCrawler was going to be run on BibServer but currently it’s not working). Yesterday’s big news was that Nature has released 30 million references or thereabouts – this furthers the cause of scholarly literature whereby we, in principle, can index records rather than just corporate organisations being able / permitted to do so. National Bibliographies have been put on BibSoup – UK (‘BL’), Germany, Spain and Sweden – with the technical problem character encodings raising its head (UTF8 solves this where used). Also, BibSoup is useful for TEXTUS so the overall ‘toolkit’ approach is reinforced!

Open Access Index

Emanuil Tolev presented on ACat – Academic Catalogue. The first part of an index is having things to access – so gathering about 55,000 journals was a good start! Using Elastic Search within these journals will give list of contents which will then provide lists of articles (via facet view), then other services will determine licensing / open access information (URL checks assisted in this process). The ongoing plan is to use this tool to ascertain licensing information for every single record in the world. (Link to ACat to follow).

Annotation Tools

Tom Oinn talked about the ideas that have come out of discussions and hacking around annotators and TEXTUS. Reading lists and citation management is a key part of what TEXTUS is intended to assist with, so the plan is for any annotation to be allowed to carry a citation – whether personal opinion or related record. Personalised lists will come out of this and TEXTUS should become a reference management tool in its own right. Keep your eye on TEXTUS for the practical applications of these ideas!

Note: more detailed write-ups will appear courtesy of others, do watch the OKFN blog for this and all things open…

Postscript: OKFN blog post here

Huge thanks to all those who participated in the event – your ideas and enthusiasm have made this so much fun to be involved with.

Also thanks to those who helped run the event, visible or behind-the-scenes, particularly Sam Leon.

Here’s to the next one

BiblioHack: Day 2, part 1

Naomi Lillie — Thu, 14 Jun 2012 10:46:36 +0000

After easing into the day with breakfast and coffee, each of the 3 sub-groups gave an overview of the mini-project’s aim and fed back on the evening’s progress:

Peter Murray-Rust revisited the overarching theme of ‘A Bibliographic Toolkit’ and the BibServer sub-group’s specific work on adding datasets and easily deploying BibServer; Adrian Pohl followed up to explain that he would be developing a National Libraries BibServer.
Tom Oinn explained the Annotation Tools sub-groups’s work on developing annotation tools – ie TEXTUS – looking at adding fragments of text, with your own comments and metadata linked to it, which then forms BibSoup collections. Collating personalised references is enhanced with existing search functionality, and reading lists with annotations can refer to other texts within TEXTUS.
Mark MacGillivray presented the 3rd group’s work on an Open Access Index. This began with listing all the journals that can be found in the whole world, with the aim of identifying the licence of each article. They have been scraping collections (eg PubMed) and gathering journals – at the time of speaking they had around 50,000+! The aim is to enable a crowd-sourced list of every journal in the world which, using PubCrawler, should provide every single article in the world.

With just 5 hours left before stopping to gather thoughts, write-up and feedback to the rest of the group, it will be very interesting to see the result…

BiblioHack: Day 1

Naomi Lillie — Thu, 14 Jun 2012 10:25:46 +0000

The first day of BiblioHack was a day of combinations and sub-divisions!

The event attendees started the day all together, both hackers and workshop / seminar attendees, and Sam introduced the purpose of the day as follows: coders – to build tools and share ideas about things that will make our shared cultural heritage and knowledge commons more accessible and useful; non-coders – to get a crash course in what openness means for galleries, libraries, archives and museums, why it’s important and how you can begin opening up your data; everyone – to get a better idea about what other people working in your domain do and engender a better understanding between librarians, academics, curators, artists and technologists, in order to foster the creation of better, cooler tools that respond to the needs of our communities.

The hackers began the day with an overview of what a hackathon is for and how it can be run, as presented by Mahendra Mahey, and followed with lightning talks as follows:

Talk 1 Peter Murray Rust & Ross Mounce – Content and Data Mining and a PDF extractor
Talk 2 Mike Jones – the m-biblio project
Talk 4 Ian Stuart – ORI/RJB (formerly OA-RJ)
Talk 5 Etienne Posthumus – Making a BibServer Parser
Talk 6 Emanuil Tolev – IDFind – identifying identifiers (“Feedback and real user needs won’t gather themselves”)
Talk 7 Mark MacGillivray – BibServer – what the project has been doing recently, how that ties into the open access index idea.
Talk 8 Tom Oinn – TEXTUS
Talk 9 Simone Fonda – Pundit – collaborative semantic annotations of texts (Semantic Web-related tool)
Talk 10 Ian Stuart – The basics of Linked Data

We decided we wanted to work as a community, using our different skills towards one overarching goal, rather than breaking into smaller groups with separate agendas. We formed the central idea of an ‘open bibliographic tool-kit’ and people identified three main areas to hack around, playing to their skills and interests:

Utilising BibServer – adding datasets and using PubCrawler
Creating an Open Access Index
Developing annotation tools

At this point we all broke for lunch, and the workshoppers and hackers mingled together. As hoped, conversations sprung up between people from the two different groups and it was great to see suggestions arising from shared ideas and applications of one group being explained to the theories of the other.

We re-grouped and the workshop continued until 16.00 – see here for Tim Hodson’s excellent write-up of the event and talks given – when the hackers were joined by some who attended the workshop. Each group gave a quick update on status, to try to persuade the new additions to the group to join their particular work-flow, and each group grew in number. After more hushed discussions and typing, the day finished with a talk from Tara Taubman about her background in the legalities of online security and IP, and we went for dinner. Hacking continued afterwards and we celebrated a hard day’s work down the pub, lookong forward to what was to come.

Day 2 to follow…

BiblioHack Meet-up

Naomi Lillie — Wed, 13 Jun 2012 18:05:26 +0000

I’ve been quiet on this blog lately, but it’s in the same way a duck looks still when swimming: things may look peaceful but there is much activity going on beneath the surface! The Open Biblio crowd have been busy on the discussion List (link to follow) and the BiblioHack organisers have been preparing for this week’s events, which kicked off with a Meet-up last night.

The pre-BiblioHack Meet-up was designed to be an informal opportunity for those involved in the events to put names to faces and start up discussions; it was also open to anyone who wanted to come along to find out more about open data and the OKFN’s Working Groups including Open GLAM, and projects such as DM2E as well as Open Biblio.

With no formal agenda, we started up conversations as the mood took us – this covered legalities of openness in relation to IP, licensing and open access, annotation, cat-sitting and the Blues. In a nod to the more ‘usual’ OKFN #OpenData meet-ups, we went around the room to introduce ourselves (trying to explain our interests in only 3 words was challenging…) which prompted some people to cross the room in a purposeful fashion to intercept someone they hadn’t spoken to by that point. I really enjoyed meeting the people with whom I’d be spending the next two days, so thanks to all those who came along, for their interesting ideas and suggestions, and huge thanks to Sam Leon for arranging the tasty food and drinks at C4CC and for facilitating the evening.

Hackathon alert: BiblioHack!

Naomi Lillie — Wed, 09 May 2012 12:21:38 +0000

This is cross-posted from the OKFN blog

The Open Knowledge Foundation’s Open Biblio group, and Working Group on Open Data in Cultural Heritage, along with DevCSI, present BiblioHack: an open Hackathon to kick-start the summer months. From Wednesday 13th – Thursday 14th June, we’ll be meeting at Queen Mary, University of London, East London, and any budding hackers are welcome, along with anyone interested in opening up metadata and the open cause – this free event aims to bring together software developers, project managers, librarians and experts in the area of Open Bibliographic Data. A workshop will run alongside the coding on the 13th, and a meet-up on the evening of the 12th is open to all whether you’re attending the Hackathon or not.

What is BiblioHack?

BiblioHack will be two days of hacking and sharing ideas about open bibliographic metadata.

There will be opportunities to hack on open bibliographic datasets and experiment with new prototypes and tools. The focus will be on building things and improving existing systems that enable people and institutions to get the most of bibliographic data.

If you’re a non-coder there are sessions for you too. We will be running a hands-on workshop addressing the technical aspects to opening up cultural heritage data looking at best of breed open source tools for doing that, preparing your data for a hackathon and the best standards for storing and exposing your data to make it more easily re-used.

When and where?

The main hackathon will take place over two days between 13th and 14th June at Queen Mary University of London
On the morning of the 13th June we’ll be running the workshop addressed at the technical challenges to opening up metadata. So for those unable to participate in the hack due to time constraints or lack of coding know how – this is for you!
On the 12th June – Tuesday evening (details TBC but will be a pub in central / east London!) – we’ll also be hosting a meet-up for anyone attending the hack and open data more generally. Whether it’s open bibliographic data, spending or government data that floats your boat all tribes are welcome!

Who is organising the event?

Open Biblio – emphasising the utilisation of tools we’ve developed. We will be focussing on our existing tools and software, the most recent updates of which are available here: front-end BibServer / BibSoup, BibJSON, back-end BibServer
Open Knowledge Foundation’s Working Group on Open Data in Cultural Heritage and Open GLAM
DM2E – a new EU funded project devoted to enabling more cultural institutions integrate their collections into Europeana
DevCSI – encouraging diverse thinking around software development and technical innovation

Who else is involved?

We’ve already lined up a whole host of speakers and groups who’ll be attending both the hack and the workshop. The list so far includes UK Discovery, CKAN, Europeana, Total Impact, Neontribe, The British Library with many more to be added in the coming days…

You’re giving your time and expertise – what do you get if you attend the whole hack?

Accommodation at QMUL overnight on the 13th
Food and drink across the 3 days
The chance to work with experts in their fields
Admiration and respect from your peers
We could expound at length, but… go on, you know you want to (it’s free!)

How can I sign up?

Please note, if you wish to attend all 3 events you should sign up for each, and the Workshop will run in parallel with the hacking on the morning of the 13th.

BiblioHack hackathon registration form

Naomi Lillie — Wed, 09 May 2012 11:27:02 +0000

Registration is now closed, but keep an eye on this blog and the main OKFN blog for upcoming events.

#OpenDataEDB 2: 16th May

Naomi Lillie — Tue, 08 May 2012 16:03:23 +0000

Following the fun we had at March’s Meet-up ‘launch’, we will be having another gathering of people interested in open data next Wednesday 16th May. Hosted by the Wash Bar, Edinburgh, from 19.00, come and join us to discuss ideas, projects and plans in relation to openness.

Lightning Talks will include Federico Sangati on crowdsourcing and education, ahead of his presentation at Dev8ed later this month, and a sneak preview of the hackathon that Open Biblio will be running 12-14th June in collaboration with OKFN’s Open GLAM and Cultural Heritage Working Group and DevCSI.

If you would like to give a lightning talk (informal 2-3 minute presentations) about anything related to open data or knowledge, contact naomi.lillie [@] okfn.org.

For this and other events in Edinburgh and the rest of Scotland, sign up here.