Open Bibliography and Open Bibliographic Data » News

Nature’s data platform strongly expanded

Adrian Pohl — Fri, 20 Jul 2012 20:05:06 +0000

Nature has largely expanded its Linked Open Data platform that was launched in April 2012. From today’s press release:

“As part of its wider commitment to open science, Nature Publishing Group’s (NPG) Linked Data Platform now hosts more than 270 million Resource Description Framework (RDF) statements. It has been expanded more than ten times, in a growing number of datasets. These datasets have been created under the Creative Commons Zero (CC0) waiver, which permits maximal use/reuse of this data. The data is now being updated in real-time and new triples are being dynamically added to the datasets as articles are published on nature.com.

Available at http://data.nature.com, the platform now contains bibliographic metadata for all NPG titles, including Scientific American back to 1845, and NPG’s academic journals published on behalf of our society partners. NPG’s Linked Data Platform now includes citation metadata for all published article references. The NPG subject ontology is also significantly expanded.

The new release expands the platform to include additional RDF statements of bibliographic, citation, data citation and ontology metadata, which are organised into 12 datasets – an increase from the 8 datasets previously available. Full snapshots of this data release are now available for download, either by individual dataset or as a complete package, for registered users at http://developers.nature.com.”

This is exciting, especially the commitment to real-time updates is a great move and shows how serious Linked Open Data becomes in general and in particular in the realm of bibliographic data. Also, Nature now uses the Data Hub and has registered the data seperated into several datasets.

Community Discussions 3

Naomi Lillie — Fri, 13 Jul 2012 12:41:46 +0000

It has been a couple of months since the round-up on Community Discussions 2 and we have been busy! BiblioHack was a highlight for me, and last week included a meeting of many OKFN types – here’s a picture taken by Lucy Chambers for @OKFN of some team members:

The Discussion List has been busy too:

Further to David Weinbergers’s pointer that Harvard released 12 million bibliographic records with a CC0 licence, Rufus Pollock created a collection on the DataHub and added it to the Biblio section for easy of reference
Rufus also noticed that OCLC had issued their major release of VIAF, meaning that millions of author records are now available as Open Data (under Open Data Commons Attribution license), and updated the DataHub dataset to reflect this
Peter Murray-Rust noted that Nature has made its metadata Open CC0
David Shotton promoted the International Workshop on Contributorship and Scholarly Attribution at Harvard, and prepared a handy guide for attribution of submissions
Adrian Pohl circulated a call for participation for the SWIB12 “Semantic Web in Bibliotheken” (Semantic Web in Libraries) Conference in Cologne, 26-28 November this year, and hosted the monthly Working Group call
Lars Aronsson looked at multivolume works, asking whether the OpenLibrary can create and connect records for each volume. HathiTrust and Gallica were suggested as potential tools in collating volumes, and the barcode (containing information populated by the source library) was noted as being invaluable in processing these
Sam Leon explained that TEXTUS would be integrating BibSever facet view and encouraged people to have a look at the work so far; Tom Oinn highlighted the collaboration between Enriched BibJSON and TEXTUS, and explained that he would be adding a ‘TEXTUS’ field to BibJSON for this purpose
Sam also circulated two tools for people to test, Pundit and Korbo, which have been developed out of Digitised Manuscripts to Europeana (DM2E)
Jenny Molloy promoted the Open Science Hackday which took place last week – see below for a snap-shot courtesy of @OKFN:

In related news, Peter Murray-Rust is continuing to advocate the cause of open data – do have a read of the latest posts on his blog to see how he’s getting on.

The Open Biblio community continues to be invaluable to the Open GLAM, Heritage, Access and other groups too and I would encourage those interested in such discussions to join up at the OKFN Lists page.

BiblioHack: Day 2, part 2

Naomi Lillie — Thu, 14 Jun 2012 15:00:10 +0000

Pens down! Or, rather, key-strokes cease!

BiblioHack has drawn to a close and the results of two days’ hard labour are in:

A Bibliographic Toolkit

Utilising BibServer

Peter Murray-Rust reported back on what was planned, what was done, and the overlap between the two! The priority was cleaning up the process for setting up BibServers and getting them running on different architectures. (PubCrawler was going to be run on BibServer but currently it’s not working). Yesterday’s big news was that Nature has released 30 million references or thereabouts – this furthers the cause of scholarly literature whereby we, in principle, can index records rather than just corporate organisations being able / permitted to do so. National Bibliographies have been put on BibSoup – UK (‘BL’), Germany, Spain and Sweden – with the technical problem character encodings raising its head (UTF8 solves this where used). Also, BibSoup is useful for TEXTUS so the overall ‘toolkit’ approach is reinforced!

Open Access Index

Emanuil Tolev presented on ACat – Academic Catalogue. The first part of an index is having things to access – so gathering about 55,000 journals was a good start! Using Elastic Search within these journals will give list of contents which will then provide lists of articles (via facet view), then other services will determine licensing / open access information (URL checks assisted in this process). The ongoing plan is to use this tool to ascertain licensing information for every single record in the world. (Link to ACat to follow).

Annotation Tools

Tom Oinn talked about the ideas that have come out of discussions and hacking around annotators and TEXTUS. Reading lists and citation management is a key part of what TEXTUS is intended to assist with, so the plan is for any annotation to be allowed to carry a citation – whether personal opinion or related record. Personalised lists will come out of this and TEXTUS should become a reference management tool in its own right. Keep your eye on TEXTUS for the practical applications of these ideas!

Note: more detailed write-ups will appear courtesy of others, do watch the OKFN blog for this and all things open…

Postscript: OKFN blog post here

Huge thanks to all those who participated in the event – your ideas and enthusiasm have made this so much fun to be involved with.

Also thanks to those who helped run the event, visible or behind-the-scenes, particularly Sam Leon.

Here’s to the next one

BiblioHack: Day 2, part 1

Naomi Lillie — Thu, 14 Jun 2012 10:46:36 +0000

After easing into the day with breakfast and coffee, each of the 3 sub-groups gave an overview of the mini-project’s aim and fed back on the evening’s progress:

Peter Murray-Rust revisited the overarching theme of ‘A Bibliographic Toolkit’ and the BibServer sub-group’s specific work on adding datasets and easily deploying BibServer; Adrian Pohl followed up to explain that he would be developing a National Libraries BibServer.
Tom Oinn explained the Annotation Tools sub-groups’s work on developing annotation tools – ie TEXTUS – looking at adding fragments of text, with your own comments and metadata linked to it, which then forms BibSoup collections. Collating personalised references is enhanced with existing search functionality, and reading lists with annotations can refer to other texts within TEXTUS.
Mark MacGillivray presented the 3rd group’s work on an Open Access Index. This began with listing all the journals that can be found in the whole world, with the aim of identifying the licence of each article. They have been scraping collections (eg PubMed) and gathering journals – at the time of speaking they had around 50,000+! The aim is to enable a crowd-sourced list of every journal in the world which, using PubCrawler, should provide every single article in the world.

With just 5 hours left before stopping to gather thoughts, write-up and feedback to the rest of the group, it will be very interesting to see the result…

Hackathon alert: BiblioHack!

Naomi Lillie — Wed, 09 May 2012 12:21:38 +0000

This is cross-posted from the OKFN blog

The Open Knowledge Foundation’s Open Biblio group, and Working Group on Open Data in Cultural Heritage, along with DevCSI, present BiblioHack: an open Hackathon to kick-start the summer months. From Wednesday 13th – Thursday 14th June, we’ll be meeting at Queen Mary, University of London, East London, and any budding hackers are welcome, along with anyone interested in opening up metadata and the open cause – this free event aims to bring together software developers, project managers, librarians and experts in the area of Open Bibliographic Data. A workshop will run alongside the coding on the 13th, and a meet-up on the evening of the 12th is open to all whether you’re attending the Hackathon or not.

What is BiblioHack?

BiblioHack will be two days of hacking and sharing ideas about open bibliographic metadata.

There will be opportunities to hack on open bibliographic datasets and experiment with new prototypes and tools. The focus will be on building things and improving existing systems that enable people and institutions to get the most of bibliographic data.

If you’re a non-coder there are sessions for you too. We will be running a hands-on workshop addressing the technical aspects to opening up cultural heritage data looking at best of breed open source tools for doing that, preparing your data for a hackathon and the best standards for storing and exposing your data to make it more easily re-used.

When and where?

The main hackathon will take place over two days between 13th and 14th June at Queen Mary University of London
On the morning of the 13th June we’ll be running the workshop addressed at the technical challenges to opening up metadata. So for those unable to participate in the hack due to time constraints or lack of coding know how – this is for you!
On the 12th June – Tuesday evening (details TBC but will be a pub in central / east London!) – we’ll also be hosting a meet-up for anyone attending the hack and open data more generally. Whether it’s open bibliographic data, spending or government data that floats your boat all tribes are welcome!

Who is organising the event?

Open Biblio – emphasising the utilisation of tools we’ve developed. We will be focussing on our existing tools and software, the most recent updates of which are available here: front-end BibServer / BibSoup, BibJSON, back-end BibServer
Open Knowledge Foundation’s Working Group on Open Data in Cultural Heritage and Open GLAM
DM2E – a new EU funded project devoted to enabling more cultural institutions integrate their collections into Europeana
DevCSI – encouraging diverse thinking around software development and technical innovation

Who else is involved?

We’ve already lined up a whole host of speakers and groups who’ll be attending both the hack and the workshop. The list so far includes UK Discovery, CKAN, Europeana, Total Impact, Neontribe, The British Library with many more to be added in the coming days…

You’re giving your time and expertise – what do you get if you attend the whole hack?

Accommodation at QMUL overnight on the 13th
Food and drink across the 3 days
The chance to work with experts in their fields
Admiration and respect from your peers
We could expound at length, but… go on, you know you want to (it’s free!)

How can I sign up?

Please note, if you wish to attend all 3 events you should sign up for each, and the Workshop will run in parallel with the hacking on the morning of the 13th.

#OpenDataEDB 2: 16th May

Naomi Lillie — Tue, 08 May 2012 16:03:23 +0000

Following the fun we had at March’s Meet-up ‘launch’, we will be having another gathering of people interested in open data next Wednesday 16th May. Hosted by the Wash Bar, Edinburgh, from 19.00, come and join us to discuss ideas, projects and plans in relation to openness.

Lightning Talks will include Federico Sangati on crowdsourcing and education, ahead of his presentation at Dev8ed later this month, and a sneak preview of the hackathon that Open Biblio will be running 12-14th June in collaboration with OKFN’s Open GLAM and Cultural Heritage Working Group and DevCSI.

If you would like to give a lightning talk (informal 2-3 minute presentations) about anything related to open data or knowledge, contact naomi.lillie [@] okfn.org.

For this and other events in Edinburgh and the rest of Scotland, sign up here.

Recent BibServer technical development

Mark MacGillivray — Tue, 08 May 2012 14:39:23 +0000

Along with the recent push of new front-end functionality to BibServer, and demonstrated on BibSoup, we have also applied some changes to the back-end.

The new scheduled collection uploader is now runnable as a stand-alone tool, to which source URLs can be provided for retrieval, conversion, and upload. Retrieved sources are stored and available from a folder on disk, as are the conversions.

Parsers can now be written in any language and plugged into the ingest functionality – for example, we now have a MARC parser that runs in perl and is usable via ingest.py and available on an instance of BibServer – thanks very much to Ed for that.

In addition, parsers need no longer be ‘parsers’ – we have introduced the concept of scrapers as well. Check out our new Wikipedia parser / scraper, for example; it functions by taking in a search value rather than a URL, then using that to search Wikipedia for relevant references which it downloads, bundles, and converts to a BibJSON collection – this is a really great example that Etienne put together, and it demonstrates a great deal of potential for further parser / scraper development.

See the examples on the BibServer repo for more insight – they are in the parserscrapers_plugins folder, and they are managed by bibserver/ingest.py.

We know documents are now lacking – we have set up an online docs resource but are in the process of writing up to populate it – please check back soon.

As usual, development work is scheduled via the tickets and milestones on our repo. Current efforts are on documentation and adding as many feature requests as possible before our hackathon on June 12th – 14th.

BibJSON updates

Mark MacGillivray — Tue, 08 May 2012 14:30:19 +0000

Following recent discussion on our mailing list, BibJSON has been updated to adopt JSON-LD for all your linked data needs.

This enables us to keep the core of BibJSON pretty simple whilst also opening up potential for more complex usage where that is required.

Due to this, we no longer use the “namespace” key in BibJSON.

Other changes include usage of “_” prefix on internal keys – so wherever our own database writes info into a record, we prefix it, such as “_id”. Because of this, uploaded BibJSON records can have an “id” key that will work, as well as an “_id” uuid applied by the BibServer system.

For more information, check out BibJSON.org and JSON-LD

New BibServer features available on BibSoup

Mark MacGillivray — Tue, 08 May 2012 14:22:53 +0000

A couple of months ago the development team had a Sprint and came up with some cool ideas of how to improve the user experience for BibServer and, subsequently, BibSoup. Have a play with the new features and see below for the details:

Main pages

Collections visualisation – a smart new graphic on the landing page showing information from new collections
Improved FAQ section with links to videos (coming soon: links to our new online docs)

Creating collections

New Wikipedia parser – create a collection based on the references retrievable from Wikipedia for your chosen search value
Improved collection upload – specify collection information, then view upload tickets to see progress and errors
‘Retry’ and other options on particular collection creation attempts are also now available from the tickets page

Search results

Filter search results by a value range as well as specific values
Visualise any filter as a bubble chart and select the values you want to search with
Add / remove available filters and rename filter display names
Improved layout of record info in search results, including auto-display of the first image referenced in a record – e.g. if there is a link to an image in your record, it is displayed in the search result

Managing and sharing collections

Collection admin available – save your current display settings as the default for your collection, allow other users to have admin rights on your own collection
Share any specific searches by providing the URL displayed under the ‘share’ option
Embed – as the whole front-end of search and collection visualisation is handled by facetview it is possible to embed your collection search in any web page you control; the share / embed option on collection pages provides the code you need to insert to enable this
Download as BibJSON – a nice new obvious button on each collection provides a link to download your collection as BibJSON

Viewing records

Improved display of individual records, including search options to discover relevant content online
EXPERIMENTAL record editing – this has been enabled although still in progress – you can edit the content of a record using a visual display of the keys and values in the record, although functionality for adding new keys does not yet work. However, you can also edit the JSON directly via the options, and try saving that. Be aware – this could damage your records, and of course changes the details from whatever they were in the source content.

Still in development

These ones are not yet available on BibSoup but watch this space:

Creating new collections on-site – search and find particular records for inclusion in new collections or addition to pre-existing collections. This is not currently possible but we are working on making this an easy process
Merging collections
Better user creation and management, plus gravatars
Additional functionality on record pages – linking out directly to related sources such as PubMed, Total Impact, Service Core etc

We hope you like these changes, and find them useful – do let us know what you think and keep an eye out for the upcoming improvements.

Harvard Library releases 12M bibliographic records under CC0

Adrian Pohl — Wed, 25 Apr 2012 07:19:21 +0000

Harvard Library yesterday announced the release of 12 Million bibliographic record into the public domain using CC0.

From the announcement:

“The Harvard Library announced it is making more than 12 million catalog records from Harvard’s 73 libraries publicly available.

The records contain bibliographic information about books, videos, audio recordings, images, manuscripts, maps, and more. The Harvard Library is making these records available in accordance with its Open Metadata Policy and under a Creative Commons 0 (CC0) public domain license. In addition, the Harvard Library announced its open distribution of metadata from its Digital Access to Scholarship at Harvard (DASH) scholarly article repository under a similar CC0 license.

‘The Harvard Library is committed to collaboration and open access. We hope this contribution is one of many steps toward sharing the vital cultural knowledge held by libraries with all,’ said Mary Lee Kennedy, Senior Associate Provost for the Harvard Library.

The catalog records are available for bulk download from Harvard, and are available for programmatic access by software applications via API’s at the Digital Public Library of America (DPLA). The records are in the standard MARC21 format.”

That’s great news. There already is an entry for this dataset at the Data Hub.

See also David Weinberger’s post on the data release.