Article on Author Identity and Open Bibliographic Data

Jim Pitman – Professor of Statistics and Mathematics at University of California, Berkeley and active in the OKFN Working Group on Open Bibliographic Data – published an article “Author Identity and Open Bibliography” in the IMS Bulletin. He gives an overview over developing author identifications services and developments in the field of open bibliographic data and sums it up:

It remains to be seen what agent or agents will end up providing the best service for individual researchers, departments or universities to display the bibliographic data they generate, and how best such data will be aggregated for search and discovery. But don’t wait for the big publishers and information brokers to monopolize this function.

Finally, Jim provides some useful tips on how to “push the publishers and subscription services to support open access to your research work and open bibliographic data”. Go here to read the whole post.

Posted in Uncategorized | Leave a comment

Nature releases Metadata for 450k Articles into the Public Domain

Yesterday Nature Publishing Group announced the launch of a Linked Data platform with RDF descriptions of more than 450,000 articles published by NPG since 1869.

From the press release:

Nature Publishing Group (NPG) today is pleased to join the linked data community by opening up access to its publication data via a linked data platform. NPG’s Linked Data Platform is available at http://data.nature.com.

The platform includes more than 20 million Resource Description Framework (RDF) statements, including primary metadata for more than 450,000 articles published by NPG since 1869. In this first release, the datasets include basic citation information (title, author, publication date, etc) as well as NPG specific ontologies. These datasets are being released under an open metadata license, Creative Commons Zero (CC0), which permits maximal use/re-use of this data.

NPG’s platform allows for easy querying, exploration and extraction of data and relationships about articles, contributors, publications, and subjects. Users can run web-standard SPARQL Protocol and RDF Query Language (SPARQL) queries to obtain and manipulate data stored as RDF. The platform uses standard vocabularies such as Dublin Core, FOAF, PRISM, BIBO and OWL, and the data is integrated with existing public datasets including CrossRef and PubMed.

That’s great news having such an important publisher moving towards open bibliographic data. As of yet, there are no dumps of the data available. Also, there is no entry on the Data Hub. Anybody?

There might be some problems with the platform from some browsers, but it is being worked on.

Posted in Data, News, Semantic Web | Tagged | 2 Comments

Minutes: 20th Virtual Meeting of the OKFN Openbiblio Group

Date: April, 3rd 2012, 15:00 GMT

Channels:Meeting was held via Skype and Etherpad

Participants

  • Adrian Pohl
  • Jim Pitman
  • Karen Coyle

Agenda

Action Items from last meeting

  • Adrian will personally ask people from the German National Library regarding BibJSON conversion –> No definite answer yet.
  • Sebastian will provide a post for openbiblio.net when the data set is officially released (sometimes in March) –> Done, see here
  • Ask members of DCMI provenance group & W3C Provenance Working Group to provide a short post about it on openbiblio.net. –> not yet done
  • Write down core resources and tasks of the openbiblio group. (Adrian) –> See http://openbiblio.net/get-involved/.

How should National Libraries provide their data?

  • We are approaching several national libraries that are providing open data to make their data re-usable by the openbiblio group.
  • There was some discussion how National Libraries and others should actually provide their data to be used in BibServer.
  • Maintaining a BibJSON dump might be asking too much.
  • Libraries who provide LOD could be asked to also provide a dump in JSON-LD. This would make it easier to re-use in a BibServer.
    • Would adding JSON-LD to content negotiation be enough?
  • Conclusion: We probably can’t make people to provide BibJSON and probably will have to transform open data ourselves.

BibJSON and JSON-LD

  • openbiblio-dev discussed aligning BibJSON with JSON-LD. Everybody is fine with namespaces in JSON.
    ACTION: Create context file for BibJSON.

Further discussion of governance and resource allocation issues

  • Openbiblio-dev/openbiblio 2 people will be attending monthly virtual meeting in the future to report on the project

Procedures for publication/maintenance of bibliographic datasets

  • Formalizing procedures for publication/maintenance of bibliographic datasets: Mark provided good start in a recent email. This should be developed on this etherpad.
  • Q from Karen: do we also want vocabularies/thesauri? Or just bib data? A: Yes, we want vocabularies/thesauri.
  • Formalizing procedures for getting volunteers to follow through on assistance with specific data liberation projects.
    ACTION: Create timeline for opening up biblio data, HowTo/FAQs; start on an [etherpad]http://okfnpad.org/howtoshareopenbibliodata() and then move it to openbiblio.net
Posted in minutes, OKFN Openbiblio | Leave a comment

Community discussions

The Open Biblio core team are a small bunch, but there’s a wealth of people out there providing information and suggestions to make our work better and more widely known. The List is a key way for people to post ideas, so here’s a round-up of some posts made over the last couple of weeks as food for thought:

Sebastian Nordhoff updated us on his efforts towards bibliographical data for the world’s lesser known languages, Langdoc. This brought into sharp relief the ever-pressing issues surrounding publishing licences, as the collections here are CC-BY-NC. Sebastian supports the Open Biblio principles, which explain data should be released as CC-0, but explained that some data has sensitive issues associated with the languages involved. Sebastian followed the discussion with this post which explains the release in more detail.

The complex issue of orphan data was raised by Karen Coyle. An organisation that hosts – but did not create – bibliographic data she wished to use declined to release the data under PDDL / CC0, because there is no specified ‘owner’ of the bibliographic records and they do not consider themselves to have rights to grant any license. Suggested solutions included claiming ownership to allow any original owner to come forward, or using http://blogs.law.harvard.edu/nesson/digital-registry or other facilities to store and release the data. Mark MacGillivray cautioned that finding work-arounds, “…Promote[s] the idea that things found… on the internet with no claims attached to them are not freely available. Promoting such a default stance would do nothing except promote the interests of people who want to own everything”, which somewhat goes against the idea that things should be open by default; Adrian Pohl agreed, citing Public Domain Mark 1.0 as a way of declaring, ‘This already is public domain because no one has rights over this data and thus no one can license it’. The ever pragmatic Peter Murray-Rust declared, “The risk [of being sued] is less than being hit by an asteroid. Go ahead”. The issue of ‘owner-less’ data rumbles on and Karen made the pertinent point that, “As more and more data (and metadata) comes onto the web we will have more of these situations” and that there should be a way for the holder of a piece of information to declare he/she claims no rights, not even physical ownership rights, over the data once leaves his/her database.

Mark circulated the news that Research Councils UK is considering changing its open access policies, to mandate that all RCUK-funded papers be made freely available six months after publication. Although the draft says that research council funding may be used to support payment of authors’ fees in open access publishing, it does not go as far as the Wellcome Trust’s policy which extends to paying to publish even when a grant is used up… But it’s certainly a step in the right direction.

Roderic Page pondered the difficulty of differing languages within the same article (as opposed to a translation of the article, which is a different record), which goes back to the crux of simplicity vs complexity as explored by the dev team and discussed here.

All this and more is available on the List here where you can sign up to be part of the discussions as well as peruse the archives.

Posted in JISC OpenBib, News, OKFN Openbiblio | Tagged , , , | Leave a comment

Announcing Glottolog/Langdoc, a knowledge base of 175k references for (mostly) underdescribed languages

We are happy to announce Glottolog/Langdoc, a comprehensive knowledgbase of references treating (mostly) underdescribed languages from the whole world.

Glottolog/Langdoc is built upon a collation of 20 source bibliographies covering the whole world, from Alaska to Australia. The original bibliographies were parsed and enriched with machine learning techniques. This allows to formulate queries such as

and combinations thereof such as

Furthermore, an areally and genalogically balanced sample can be drawn.

All references have their own URIs. All resources are available as xhtml and rdf, and can be downloaded as bib, html, txt, or via Zotero. Dumps of references are available as a very large *bib and as a dump in rdf+xml.
Glottolog/Langdoc content is made available under CC-BY-NC. Intercultural issues upstream unfortunately prevent us from releasing the content under a more permissive license.

Glottolog/Langdoc uses DCMI, BIBO, FRBR, and ISBD ontologies to provide an interoperable resource. Glottolog is part of the Linguistic Linked Open Data cloud. We are working on a SPARQL endpoint, which will probably be made available in June this year.

We are happy to contribute our bibliographical resource to the Linked Data cloud and would welcome feedback under glottolog@eva.mpg.de.

Posted in Uncategorized | Leave a comment

Planning for the next three months

We have developed BibJSON

We’ve improved BibServer

We’ve made BibSoup

…But what’s next?

The nature of cutting-edge technology is that it is fast-paced and constantly adapting. We may think we’ve come up with a good idea, but if it turns out someone else has already had that idea and developed it – that’s great and means we incorporate it and go on to the next exciting thing. We may think that this next thing is important, but if it turns out it doesn’t quite do the helpful thing needed to make our users delighted or promote open bibliographic data – we change tack and try something else. We know what we want to do, ie make useful and smart tools for the people doing wonderful things in the public domain, but, as for what our end product looks like (if indeed there is the one product to play with) – well, that all depends on the emerging requirements, other technologies that come to light and how successful our ideas are along the way.

Taking all that into account, at the Sprint last week we attempted to plan for the next three months. Our work will be more successful the more focused we are, and having an end-result in mind is useful for that. So, here’s a rough guide to how we think our project will shape up between now and June:

To-Do

Timeline

NB the images are a little fuzzy, but do click on them to follow the links to Flickr where these are stored and appear more clearly.

We have already published the CUL blog post and Mark has written about BiBServer functionality that arose from ideas at the Sprint. We’ll develop these ideas into workable and worthwhile tools or processes, and before we know it we’ll be three months down the line and thinking ‘…but what’s next?’

Posted in BibServer, JISC OpenBib, minutes, OKFN Openbiblio | Tagged , , , , , , , , , | Leave a comment

BibServer new functionality

During the sprint last week we made a lot of progress with the new functionality for version 0.5.0 – however, Etienne and I got so excited by some new ideas that we did not finish on time; apologies for the delay.

We will be making the new version available over the course of this week, and will have it up and running on http://bibsoup.net soon.

Below is an overview of the new functionality you can expect to see over the course of the next week; we will write some blog posts about the various new capabilities, and this will tie in with the focus of the next sprint – doing docs, tests and issues (no new functionality).

  • editing of records and collections
  • merging collections from multiple sources
  • adding notes to records
  • much improved search UI
  • embed images in search results
  • better visualisation of collections
  • embeddable UI into other web pages via javascript
  • asynchronous parsing – you don’t have to hang on the page waiting for it to complete
  • feedback tickets from asynchronous parses
  • sharing collection admin rights with other users
  • new parser for NLM XML
  • new parser concept – search term gets pages from wikipedia, pulls citations from pages
  • capability to accept and run parsers written in different programming languages
  • browse site users
Posted in JISC OpenBib | Tagged , , , , , , | Leave a comment

Europeana and Linked Open Data

Europeana has recently released a new version of its Linked Data Pilot, data.europeana.eu. We now publish data for 2.4 million objects under an open metadata licence: CC0, the Creative Commons Public Domain Dedication. This post elaborates on this earlier one by Naomi.

The interest of Europeana for Linked Open Data

Europeana aims to provide the widest access possible to the European cultural heritage massively published through digital resources by hundreds of musea, libraries and archives. This includes empowering other actors to build services that contribute to such access. Making data openly available to the public and private sectors alike is thus central to Europeana’s business strategy. We are also trying to provide a better service by making available richer data than the one very often published by cultural institutions. Data where millions of texts, images, videos and sounds are linked to other relevant resources: persons, places, concepts…

Europeana has therefore been interested for a while in Linked Data, as a technology that facilitates these objectives. We entirely subscribe to the views expressed in the W3C Library Linked Data report, which shows the benefits (but also acknowledges the challenges) of Linked Data for the cultural sector.

Europeana’s first toe in the Linked Data water

Last year, we released a first Linked Data pilot at data.europeana.eu. This has been a very exciting moment, a first opportunity for us to play with Linked Data.

We could deploy our prototype relatively easily and the whole experience was extremely valuable, from a technical perspective. In particular, this has been the first large-scale implementation of Europeana’s new approach to metadata, the Europeana Data Model (EDM). This model enables the representation of much richer data compared to the current format used by Europeana in its production service. First, our pilot could use EDM’s ability to represent several perspectives over a cultural object. We have used it to distinguish the original metadata our providers send us, from the data that we add ourselves. Among the Europeana data there are indeed enrichments that are created automatically and are not checked by professional data curators. For trust purposes, it is important that data consumers can see the difference.

We could also better highlight a part of Europeana’s added value as a central point for accessing digitized cultural material, in direct connection with the above mentioned enrichment. Europeana indeed employs semantic extraction tools that connect its objects with large multilingual reference resources available as Linked Data, in particular Geonames and GEMET. This new metadata allows us to deliver a better search service, especially in a European context. With the Linked Data pilot we could explicitly point at them, in the same environment they are published in. We hope this will help the entire community to better recognize the importance of these sources, and continue to provide authority resources in interoperable Linked Data format, using for example the SKOS vocabulary.

If you are interested in more lessons learnt from a technical perspective, we have published more of them in a technical paper at the Dublin Core conference last year. Among the less positive aspects, data.europeana.eu is still not part of the production system behind the main europeana.eu portal. It does not come with the guarantee of service we would like to offer for the linked data server, though the provision of data dumps is not impacted by this.

Making progress on Open Data

Another downside is that data.europeana.eu publishes data only for a subset of the objects the our main portal provides access to. We started with 3.5 million objects over a total of 20 millions. These were selected after a call for volunteers, to which only few providers answered. Additionally, we could not release our metadata under fully Open terms. This was clearly an obstacle to the re-use of our data.

After several months we have thus released a second version of data.europeana.eu. Though still a pilot, it nows contain fully open metadata (CC0).

The new version concerns an even smaller subset of our collections: in February 2012, data.europeana.eu contains metadata on 2.4 million objects. But this must be considered in context. The qualitative step of fully open publication is crucial to us. And over the past year, we have started an active campaign to convince our community of opening up their metadata, allowing everyone to make it work harder for the benefits of end users. The current metadata served at data.europeana come from data providers who have reacted early and positively to our efforts. We trust we will be able to make metadata available for many more objects in the coming year.

In fact we hope that this Linked Open Data pilot can contribute a part of our Open Data advocacy message. We believe such technology can trigger third parties to develop innovative applications and services, stimulating end users’ interest for digitized heritage. This would of course help to convince more partners to contribute metadata openly in the future. We have released next to our new pilot an animation that conveys exactly this message, you can view it here.

For additional information about access to and technical details of the dataset, see data.europeana.eu and our entry on the Data Hub.

Posted in Data, guest post, LOD-LAM, Semantic Web | Leave a comment

Day 3 of the March Sprint

This morning we were buzzing from the Meet-up, excited about the interesting people we met and the cool things they talked about. Graham Steel, who was in town for yesterday’s event, stopped by to see what the team was up to (largely coding / blogging and ignoring one another) and Mahendra headed home with our thanks for a really great event.

Work on new functionality for BibServer continued today, as Mark wired up the new front end into the back end which should – after testing – add some smart and helpful options to your user experience. Etienne worked away on the back end, with a new asynchronous parser sub-system; he was also excited about his development of an example parser plug-in to query Wikipedia and parse results using BibJSON. The idea of this is that, when searching for something in Wikipedia, each page result for that word is parsed for citations; these citations are then put through BibJSON and dropped into BibSoup as a collection. So, you search for X in and a moment later there is a BibSoup collection by that name displaying all related citations! This is still in the testing phase, and search phrases have to be precise as all of Wikipedia’s relevant results are returned, but we are ‘guardedly excited’, to borrow Etienne’s elegant phrasing. More on this, and the other cool coding Mark and Etienne have been doing, later.

Meanwhile, I have been writing up about last night’s Meet-up and following up with the lovely attendees, as well as thinking more about the Hackathon in June. The group discussed OpenGLAM and publicdomainworks which are projects / areas we have had / will have a lot in common with, and looked at ongoing opportunities together.

There will be more blog posts as coding is tested, events confirmed and collaborations agreed, so watch this space.

Posted in event, JISC OpenBib, OKFN Openbiblio | Tagged , , , , , , | 2 Comments

#OpenDataEDB: the results

Last night was the first OKFN Meet-Up in Scotland* at the Ghillie Dhu, Edinburgh, run in collaboration with DevCSI. 19 people attended from around the city and nearby, including Glasgow, and those visiting for the Open Biblio Sprint represented Cambridge, London, Wolverhampton and the Netherlands.

The Auditorium was a beautiful venue, and there was a good space for giving presentations complete with seamless audio and visual equipment (a rare treat!).

IMG_0315

We kicked off with the first three Lightning Talks:

It was great to see people gravitating towards those whose presentations had struck a chord… Mahendra had invited discussion around potential events and many people had plans or ideas which they wanted to run past him, while Rod’s points on taxonomy were pertinent to Mark’s work on BibServer as well as others’ research. Other discussions grew between the bar snacks, as people began with the standard ‘what do you do?’ and swiftly developed into ‘oh that’s funny, I was talking to so-and-so about that just now…’ Our dedicated bartender was contributing too, as he specialised in nanotechnology!

The next three talks followed:

The hubbub of enthusiasm started up again, and it appeared there were good conversations and connections emerging around the room. From these, or perhaps just courage from having seen others do their presentations (and me fumbling along as make-shift compère), two additional people decided to give impromptu talks:

Many thanks to all those who presented and to those who attended to discuss all things #OpenData. Hopefully everyone left with good ideas of topics and people to follow up with afterwards, and who knows where these will lead?

IMG_0306

As this was our first Scotland-based Meet-up we’d be glad to get feedback so we can improve; the next one is planned for May, so if you have anything you’d particularly like to see, hear or say, let us know (one suggestion was that talks are recorded, so people unable to attend can keep up-to-date). This and other events will be promoted via the OKFN Scotland List, so do sign up here otherwise you might miss out!

* Actually, it turns out that there was a Meet-up in Scotland in 2010, according to someone who’s been on the scene longer than I… but I won’t tell anyone if you don’t 😉

Postscript, 23rd March: see here for a review by Laura Newman and a link to one by Graham Steel, thanks to Information Today Europe.

Posted in event, JISC OpenBib, OKFN Openbiblio, Talks | Tagged , , , | 7 Comments