German National Library goes LOD & publishes National Bibliography

Good news from Germany. The German National Library

  1. changed its licensing regime for Linked Data to CC0 which makes the data open according to the open definition,
  2. has begun to publish the German national bibliography as Linked Open Data.

For background see the email (German) announcing this step. There it says (my translation):

In 2010 the German National Library (DNB) started publishing authority data as Linked Data. The existing Linked Data service of the DNB is now extended with title data. In this context the licence for linked data is shifted to “Creative Commons Zero.

Until now, the majority of DNB title data is implemented as well as periodicals and series – the music data and holdings of the German Exiles Archive are missing. From now on, the RDF/XML representation of a title record is available in the DNB portal via a link. This is expressly an experimental service which will be extended and refined continually. More detailed informations about modelling questions and the general approach can be fund in the updated documentation.

On the wiki page about the LOD service it says: “Examples and further information about FTP-downloads will come soon.” An entry on the Data Hub has already been made for the data.

Posted in Data | 1 Comment

BibServer screencast and user perspective

BibServer software allows people (you, me, the person in the office down the road) to hold and share collections of searchable data. Be it the list of books you have to read for your course this semester, the publications you produced in your research, the database of all staff at your organisation or your neatly categorised weekly shop (‘aisle 7: toothpaste, but only if BOGOF’), BibServer allows you to view, search, share and maintain this information.

If, like me, you are not of a technical bent, do not despair – Mark has created a straightforward video guide on how to use it and how it’s useful:

Setting up a Bibserver and Faceted Browsing (Mark MacGillivray) from Bibsoup Project on Vimeo.

Mark Williamson, a post-doctoral researcher at Cambridge University, was introduced to BibServer and we filmed him talking about using it a (very) short while later:

Ingesting a personal collection of references into Bibserver (Mark Williamson) from Bibsoup Project on Vimeo.

Thanks to Peter for his camera and video-production skills, and of course his blog.

Posted in BibServer, JISC OpenBib | Tagged , , , | Leave a comment

Sprint videos

Last week’s sprint produced more than just parsers, game-plans and blog posts (Day 1, Day 2 and Day 3): it also allowed Peter and Naomi to stretch their directorial wings and produce some video blogs to record what we were doing as we went along. With minimal journalistic credentials (ok, none), we relied on the natural animation of the participants to sell the story… See how you think they did:

Interview with Mark MacGillivray, Openbiblio 2 project from Bibsoup Project on Vimeo.

Peter Murray-Rust Co-I Openbiblio project talks from Bibsoup Project on Vimeo.

Bibsoup: Interview with Etienne Posthumus (developer) from Bibsoup Project on Vimeo.

Interview with Ed Chamberlain, Openbiblio2 project from Bibsoup Project on Vimeo.

Thanks to Peter for the videos as presented on his brilliant blog!

Posted in BibServer, event, JISC OpenBib | Tagged , | Leave a comment

Thursday 19th January – Open Biblio Sprint: Day 3

Today we were joined by additional members of the OKFN team from various parts of the world – Ira, Sam and Primavera. Then the fun began…

  • Sam and Mark discussed the interface between Open Biblio and the TEXTUS project, looking at text and image processing. Project Gutenberg was suggested as one possible avenue, exploring scanned archives being processed in order to provide searchable text. It was agreed that openphilosophy.org would be the best central point of reference for this data, as an instance of TEXTUS with BibServer support in the background.
  • Mark and Ira discussed how to present CKAN / BibServer at events such as Dev8D – there is cross-over between the two, and we took the opportunity to learn more about both projects. CKAN is a purpose-built data catalogue with flexible addons and a mature open source product. Both are part of the OKFN and, combined, are an easy way to publish and find data and references. It was agreed that working more closely together would be of mutual benefit as well as to the wider community.
  • Peter and Mark discussed BibServer in terms of where we could offer CKAN and other services to academic / research groups, as a stack of tools that would find beneficial to their work. There is a lots of talk just now about using dspace, e-prints or some as yet uninvented system for storing research data – JISC is funding some projects, and we will be having a discussion about this at Dev8D.
  • Sam, Primavera and Etienne hacked some code and Etienne also continued his work on the parsers.
  • Peter, Mark and I discussed BKN – the Bibliographic Knowledge Network – which is Jim Pitman’s project and the first BibServer… Follow-up happening next week.
  • Peter and I interviewed Mark Williamson, a post-doctoral researcher at the Chemistry Department, about using BibSoup (which he’d only looked at for a few moments before we put him on the spot – thanks Mark!). Mark also gave us a demonstration of using BibSoup for the blog which is a good ‘how to’ for people who haven’t used it before. Peter’s excellent summary of BibSoup goes as follows: “BibSoup is a philosophy rather than a technology – ie having local control over bibliographic data. The idea is to get people to share data together and to sign-up to supporting it in 5 years’ time”. We will follow up soon with links to the videos we made.

The main aim for the past few days was to get all the people working on the project together, with the aim of Getting Stuff Done: do some coding, boot up some dataset demos, plan more demos and integrations, plan further community engagement and coding over the next six months, integrating BibServer with other projects, etc… Amongst all the lively discussions, I think it’s safe to say the aim was achieved!

Many thanks to all involved – if anything from this week strikes you as particularly interesting, please do get involved.

Posted in BibServer, event, JISC OpenBib | Tagged | 2 Comments

Wednesday 18th January – Open Biblio Sprint: Day 2

BibServer took precedence this morning, with Etienne, Ed and Mark continuing to develop the BibServer parsers… By March we want people to be able to download and run their own instance of the Server, or to provide a service whereby we can do it for them. We discussed various use cases that could be used to explain how BibServer is valuable in data collection – for example: a departmental administrator / researcher is required to provide a list of publications of those within his / her department using Symplectic / Web of Science / Endnote, and is to upload this information to the department’s website. BibSoup is ideal for this scenario as it allows different formats to be entered or produced, and the resulting collection can be easily searched and embedded in other web pages.

In the time we had been exploring the benefits of the BibServer, some e-mails had come through to the List with examples of collections. Starting with these, we identified a total of six people / groups who would drive open resources (BibSoup / BibServer) and whose data could be used as demonstrations (interpreted as Reading or Publication List format):

  • Malaria – Tom Olijhoek (Medline)
  • Sancoma – Gilles (Medline)
  • Karol Langner – personal libraries of people
  • UCC-PMR
  • Physics.cam.ac
  • Jim Pitman

It was agreed that these would be parsed through BibServer and used as examples of the functionality and importance of BibServer. Some, such as Jim’s probability web, may also benefit from dedicated BibServer instances.

Peter and I then interviewed members of the team, to record what was going on and talk about the project in general; this became short video blogs which will be available shortly.

Posted in BibServer, event, JISC OpenBib | Tagged | 3 Comments

Tuesday 17th January – Open Biblio Sprint: Day 1

Today Etienne, Ed, Rufus, Mark, Peter and I met up to start the first sprint of the new year. We began by clarifying the purpose of the sprint, today’s agenda and the project overall. We stated our aims as follows: we are not trying to re-do what is already available online, we are not getting into the detail of normalisation or disambiguation within a centralised database, and we are not intending to alter the academic culture overnight; however, we are going to improve the BibJSON facility for wider use, we are trying to determine how we can get more small groups and individuals involved, and we are identifying compelling, essential and simple reasons for people to support the project at this early stage before the ultimate global benefits can be realised. With this in mind, we got cracking.

Etienne, the newest member of the team, began coding pretty much straight away – he and Ed started working on MARC / RIS parsers. Peter and I started a huge list of FAQs for the website – Peter asking in-depth questions such as ‘do we want to create a single BibJSON collection for all the world’s metadata?’, me going for slightly less detail with ‘what is metadata?’ – and Mark assisted where he was needed. Peter hacked some datasets to go into the parsers and Mark got some coding done too. There was good progress made and we are set up well for tomorrow to crunch some issues.

One problem to be revisited is in relation to BibJSON, in having copies of the same record within different collections. If an object (the record) is held within multiple collections, there are separate copies of that object which could cause problems – for example, if a record is copied into several collections and then a typo is found, it can be a mammoth task trying to track down all erroneous copies and correct them… This issue is likely to be solved by creating Master / Slave relationship between copies. Also with regards to BibJSON, Etienne suggested providing a flat HTML version of collections (in addition to the javascript option), for easy use in departmental web pages.

All set for day 2…

Posted in BibServer, event, JISC OpenBib | Tagged | 1 Comment

BibServer Code Sprint January 2011

BibServer team will be having a code sprint this week.

  • When: Tuesday 17
  • Where: Cambridge, UK (and online)

More details to follow.

Posted in BibServer, JISC OpenBib, News | Tagged | 1 Comment

Sweden Ends Negotiations with OCLC

The following guest post is by Maria Kadesjö who works at the Libris-department at the National Library of Sweden.

The national library of Sweden has ended negotiations with OCLC on participation in WorldCat, as the parties could not come to an agreement. The negotiations go back to 2006 and the key obstacle for the national library has been the record use policy. Some time into the negotiations OCLC presented certain conditions for how records taken from WorldCat for cataloguing were to be used in Libris.

A No to WorldCat Rights and Responsibilites

The question has its base in the “WorldCat Rights and Responsibilities for the OCLC Cooperative” where you as an OCLC member have to accept certain conditions and the aim is to support the ongoing and long-term viability and utility of WorldCat (and its services). The National Library cannot accept the conditions as they are today since WorldCat is not the only arena in which The National Library wants and needs to be active. Accepting the conditions would mean that we would forever have to relate to OCLC’s policy.

Libris an open database

Libris is the Swedish union catalogue with (with some 170 libraries, primarily academic libraries) and is built quite similar to WorldCat but on a smaller scale. Member libraries catalogue their records in Libris and the records are then exported to their local library systems. The Libris-cooperative is built on voluntarily participation. Any Libris-library should be able to, whenever they want, take out all their bibliographic records from Libris and use them in another library system. The National library makes no claim to the records and do not control how the libraries chose to use their bibliographic records.

The National Library has taken the decision to release the national bibliography and authority data as open data. The reason for this is to acknowledge the importance of open data and the importance of libraries’ control of their data when it comes to the long term sustainability and competition of the services needed by libraries and their users.

In the Agreement on Participation in the Libris joint catalogue signed by the National Library and cataloguing libraries, Point 3.3 specifies that “the content in Libris is owned by the National Library and is freely accessible in accordance with the precepts and methods reported by the National Library, both for Participating Libraries and for external partners.” This paragraph is needed so that the National library can sign agreements for Libris with other partners (OCLC for example) but also so that the National Library can abstain from claims of ownership of bibliographic records taken from Libris. Libris can therefore be an open database, both for the libraries that use Libris for cataloguing and for others.

Not consistent with Libris principles

The consequences of signing with OCLC would be that the National Library would have to supervise how records originating in WorldCat were used. And a library that took a bibliographic record from WorldCat for cataloguing in Libris and exported it to its own system would have to accept OCLC´s term of use.

A library that wished to leave Libris would not obviously be able to do this, since it is not self-evident that the bibliographic records could be integrated into other systems. This would be an infringement in the voluntarily participation that characterizes Libris. In practice, the National Library has no mandate that restricts the freedom of action of Libris’ libraries in this way, since the National Library has no possibility of influencing how the Libris libraries themselves choose to use their catalogues.

Posted in guest post, vendors | Leave a comment

JISC Discovery 2012

Today (11th jan 2012) I am attending the JISC Discovery 2012 meeting to learn about the JISC projects aiming to increase open access to research materials. Here are some notes I took during the day:

Joy Palmer – Mimas – Copac

Search for records relevant in some way to each other, visualise and export as MODS or CSV. Would it be worth exporting (bib)JSON? Would they be interested in that?

Christine Madsen – Bodleian Libraries

They have collections they want to make more usable to machines and people, and to get rid of silos e.g. allow the content to be aggregated by others – Europeana etc. They will provide some metadata, OAI, Open Linked Data, via APIs. Using the commercial iNQUIRE interface from Armadillo Systems for people to search the repo. (iNQUIRE runs over a SOLR index, similar to project blacklight – but is it open source?)

Me – JISC Open Biblio 2

We are working on building interfaces for people to quickly share their collections in useful ways – how can we make sure it is easy to share and consume this metadata?

John Gilby – M25 Consortium

Reusing Copac records and comparing with M25 libraries to build higher quality bibliographic metadata – using open source. Developing an API for embedding in resource discovery systems. practical guidelines for open metadata principles. Should show them our principles

Contextual Wrappers

Using collection descriptions to help people find relevant collections. Using Culture Grid for metadata. Working with people like University Museums in Scotland to share collection descriptions.

Eric Cross, Stephen McGough – Newcastle University – The Cutting Edge

Bringing together archaeological and ethnographic museum objects – sharp-edged objects such as tools, axes etc as the focus. For performing use-wear analysis on the objects. Building a comprehensive metadata internet collection about such objects. Spanning multiple collections and searching across them using SPARQL queries, presenting a single front end.

Leif Isaksen – Southampton – Pelagios 2

Moved from JISC Geo – enabling linked ancient Geo data in open systems, focussing on lightweight annotation approach. Looking for connections between documents that are related by place, mapping and visualising the connections. Building a cataloguing, search, visualisation service and community toolkit.

Step change

Building systems to generate linked data from archivist workflows, so they do not need to care about RDF themselves. Analyses descriptions against OpenCalais. Aiming to connect it to Calm, so that archivists can locally generate linked data and push it to their Calm UIs. Similarly doing this with Historypin – who generate UIs to show interesting historical things near a certain geographic area. Working in context with Cumbria archive service. launching an API via UKAT.

Edina Geotagger, MediaHub and SUNCAT

Providing a webs ervice around Exif|Tool, to geotag / geocode image, audio, and video metadata. MediaHub provides images, video and audio licensed by JISC collections and harvested from other providers. can parse records and identify probably people, places and dates – then suggest values to users. Hoping to generate better participation by just asking users “is this a date” and so on, rather than them having to type into a form. SUNCAT aggregates journal holdings info from 82 UK libraries, as MarcXML and linked data. Aiming to increase the amount of metadata that is openly available. Working to establish use cases for formats such as MODS / DC, and exploring the licensing status of RDF triples.

ServiceCORE

provides a jquery plugin for an institutional repo that shows relevant information about similar entries in other repos elsewhere. Aiming to build a web service layer providing programmable access to aggregated content and metadata in institutional repos. Also a pilot tool for automatic subject-based classification of content using text categorisation techniques. overall, providing an enhanced related resource discovery system based on text mining. Hoping to offer a way for people to find versions of papers that they can access, as opposed to ones they cannot. Building on OAI-PMH compliant repos.

Clock – Lincoln and Cambridge

This is a continuation of Jerome. Enriched bib data from Jerome, Comet and elsewhere. A set of dev APIs and linked data endpoints. Aiming to establish a distributed scholarly catalogue for the UK. Planning to work closely with JISC open bib.

LUNCH!

Afternoon session – grouping up around particular themes

  • RDF
  • sharing data
  • authorities/indexes/access points
  • User interfaces
  • Collection descriptions across A + H

I joined the sharing data group. Dicsussion turned out to be quite brief, and just covered “what is the problem of sharing data”? There was no time to come up with a response.

Studies in Discovery

The Discovery ecology is aiming to explain why institutions should do it / take part in it. It will do this by writing up case studies. Doing the 12 criteria of Discovery will mean that you are “doing” discovery:

  • adopting open licensing
  • requiring clear reasonable terms and conditions
  • using easily understood data models
  • deploying persistent identifiers
  • establishing data relationships by re-using authoritative identifiers
  • providing clear mechanisms for accessing APIs
  • documenting APIs
  • adopting widely understood data formats
  • ensuring data is sustainable
  • ensuring services are supported
  • using your own APIs
  • collecting data to measure use

Business case

How is our project to be sustainable? What will maintain it? How will it survive long term, beyond JISC funding? Need to provide feedback to JISC Discovery about what suitable business cases are.

Group discussions

We considered how the 12 studies listed above had already been or would be considered by our projects. In some cases, some of the study topics were not relevant, but on the whole they are useful. We will keep these topics in mind whilst writing up project blog posts, and may well write specific posts about those topics.

Of particular interest to me was the “adopting widely understood data formats” study – because this goes beyond the scope of any one project, but is also something that would be of benefit. However, whether or not it is of benefit depends on whether or not any two or more people / groups decides it is of benefit… I will follow up to the discovery mailing list with information about our current thinking on bibjson, with details about our parsers API (once I have finished it), and links to the metadata guide that has been created by Primavera and others.

That is the end. I will follow up with some projects after today, and also we are already meeting up with CLOCK Cambridge and Lincoln. Next Discovery meeting will be in April, then in July.

Posted in JISC OpenBib | Tagged | Leave a comment

Informing public access to peer reviewed scholarly publications and data resulting from publicly funded research

The US government (OSTP) has recently issued an RFI on Open Access to data resulting from publicly funded research. The deadline for responding to the RFI has been extended to January 12.

http://www.whitehouse.gov/blog/2011/12/21/extended-deadline-public-access-and-digital-data-rfis

Detailed responses are openly available for collaborative editing and signing, here:

For digital data – https://docs.google.com/document/d/1QA1eGBynqh-yN0bo3_nYzD3d26nEhvuVPMUR2ffi17o/edit?hl=en_US

For peer reviewed scholarly publications – https://docs.google.com/document/d/1vEcWqAz6bwIIR6qQqWZYc8iUBrOpJ9NrvC9HiiQMc2Y/edit?hl=en_US

I request that anyone with an interest in scholarship responds in advocacy of open access, either via the above responses or directly to the OSTP. I justify my request not by re-iterating the virtues of open access (there are plenty), but by countering a common basis of closed access arguments: I propose that open access can be profitable, if profit is desired. (Although I admit that my motivation is for learning stuff, not making money.)

Consider the gold rush upon which expanding colonisation of America so successfully relied; why did people care about gold that much? why did they go to such great lengths to traverse the wilds and dig it up, risking or losing their lives in the process? Of course, the answer is because it was highly valued – and the reason it was so highly valued was precisely because it is so hard and risky to come by.

Controlling access to a resource is a common way to generate profit; because gold is inherently hard to access, it is a good basis for an economy. Similarly, all sorts of materials that are found to have desirable properties become valuable, usually as a function of their desirability in relation to accessibility.

Digital artefacts, however, are very easy to copy and distribute. In cases where an industry has grown up around the distribution of a product that has become digitally easy to copy and share, efforts have been made to artificially maintain that difficulty via the application of the concept of digital piracy.

If gold were easy to find, easy to copy, easy to distribute – would it help if we made it poisonous?

Encumbering digital artefacts with artificial accessibility restrictions does not make them hard to find, copy, or distribute – it just makes them needlessly complicated.

The traditional increase desirability / decrease accessibility paradigm does not readily apply to digital artefacts. Fortunately, the problem of profiting from them has been solved, and solved often; they are regularly purchased or consumed via profitable services on the basis of convenience or improved user experience. In such cases, open access to a resource facilitates building a useful (or at least desirable or fashionable) service – consider Google, Facebook, Youtube, Spotify.

The case of publicly funded scholarly output is further complicated by the fact that accessibility is inherent to desirability – the point is to build on what we learn, and we cannot do that if we cannot access it. Achieving anything with these artefacts – discovering, sharing, learning, communicating, archiving, profiting – is best done in the ideal environment where they are easy to find, copy, distribute – and are not poisonous.

We need open access, not restricted access.

Posted in JISC OpenBib, OKFN Openbiblio | Tagged | 1 Comment