JISC Discovery 2012

Today (11th jan 2012) I am attending the JISC Discovery 2012 meeting to learn about the JISC projects aiming to increase open access to research materials. Here are some notes I took during the day:

Joy Palmer – Mimas – Copac

Search for records relevant in some way to each other, visualise and export as MODS or CSV. Would it be worth exporting (bib)JSON? Would they be interested in that?

Christine Madsen – Bodleian Libraries

They have collections they want to make more usable to machines and people, and to get rid of silos e.g. allow the content to be aggregated by others – Europeana etc. They will provide some metadata, OAI, Open Linked Data, via APIs. Using the commercial iNQUIRE interface from Armadillo Systems for people to search the repo. (iNQUIRE runs over a SOLR index, similar to project blacklight – but is it open source?)

Me – JISC Open Biblio 2

We are working on building interfaces for people to quickly share their collections in useful ways – how can we make sure it is easy to share and consume this metadata?

John Gilby – M25 Consortium

Reusing Copac records and comparing with M25 libraries to build higher quality bibliographic metadata – using open source. Developing an API for embedding in resource discovery systems. practical guidelines for open metadata principles. Should show them our principles

Contextual Wrappers

Using collection descriptions to help people find relevant collections. Using Culture Grid for metadata. Working with people like University Museums in Scotland to share collection descriptions.

Eric Cross, Stephen McGough – Newcastle University – The Cutting Edge

Bringing together archaeological and ethnographic museum objects – sharp-edged objects such as tools, axes etc as the focus. For performing use-wear analysis on the objects. Building a comprehensive metadata internet collection about such objects. Spanning multiple collections and searching across them using SPARQL queries, presenting a single front end.

Leif Isaksen – Southampton – Pelagios 2

Moved from JISC Geo – enabling linked ancient Geo data in open systems, focussing on lightweight annotation approach. Looking for connections between documents that are related by place, mapping and visualising the connections. Building a cataloguing, search, visualisation service and community toolkit.

Step change

Building systems to generate linked data from archivist workflows, so they do not need to care about RDF themselves. Analyses descriptions against OpenCalais. Aiming to connect it to Calm, so that archivists can locally generate linked data and push it to their Calm UIs. Similarly doing this with Historypin – who generate UIs to show interesting historical things near a certain geographic area. Working in context with Cumbria archive service. launching an API via UKAT.

Edina Geotagger, MediaHub and SUNCAT

Providing a webs ervice around Exif|Tool, to geotag / geocode image, audio, and video metadata. MediaHub provides images, video and audio licensed by JISC collections and harvested from other providers. can parse records and identify probably people, places and dates – then suggest values to users. Hoping to generate better participation by just asking users “is this a date” and so on, rather than them having to type into a form. SUNCAT aggregates journal holdings info from 82 UK libraries, as MarcXML and linked data. Aiming to increase the amount of metadata that is openly available. Working to establish use cases for formats such as MODS / DC, and exploring the licensing status of RDF triples.

ServiceCORE

provides a jquery plugin for an institutional repo that shows relevant information about similar entries in other repos elsewhere. Aiming to build a web service layer providing programmable access to aggregated content and metadata in institutional repos. Also a pilot tool for automatic subject-based classification of content using text categorisation techniques. overall, providing an enhanced related resource discovery system based on text mining. Hoping to offer a way for people to find versions of papers that they can access, as opposed to ones they cannot. Building on OAI-PMH compliant repos.

Clock – Lincoln and Cambridge

This is a continuation of Jerome. Enriched bib data from Jerome, Comet and elsewhere. A set of dev APIs and linked data endpoints. Aiming to establish a distributed scholarly catalogue for the UK. Planning to work closely with JISC open bib.

LUNCH!

Afternoon session – grouping up around particular themes

  • RDF
  • sharing data
  • authorities/indexes/access points
  • User interfaces
  • Collection descriptions across A + H

I joined the sharing data group. Dicsussion turned out to be quite brief, and just covered “what is the problem of sharing data”? There was no time to come up with a response.

Studies in Discovery

The Discovery ecology is aiming to explain why institutions should do it / take part in it. It will do this by writing up case studies. Doing the 12 criteria of Discovery will mean that you are “doing” discovery:

  • adopting open licensing
  • requiring clear reasonable terms and conditions
  • using easily understood data models
  • deploying persistent identifiers
  • establishing data relationships by re-using authoritative identifiers
  • providing clear mechanisms for accessing APIs
  • documenting APIs
  • adopting widely understood data formats
  • ensuring data is sustainable
  • ensuring services are supported
  • using your own APIs
  • collecting data to measure use

Business case

How is our project to be sustainable? What will maintain it? How will it survive long term, beyond JISC funding? Need to provide feedback to JISC Discovery about what suitable business cases are.

Group discussions

We considered how the 12 studies listed above had already been or would be considered by our projects. In some cases, some of the study topics were not relevant, but on the whole they are useful. We will keep these topics in mind whilst writing up project blog posts, and may well write specific posts about those topics.

Of particular interest to me was the “adopting widely understood data formats” study – because this goes beyond the scope of any one project, but is also something that would be of benefit. However, whether or not it is of benefit depends on whether or not any two or more people / groups decides it is of benefit… I will follow up to the discovery mailing list with information about our current thinking on bibjson, with details about our parsers API (once I have finished it), and links to the metadata guide that has been created by Primavera and others.

That is the end. I will follow up with some projects after today, and also we are already meeting up with CLOCK Cambridge and Lincoln. Next Discovery meeting will be in April, then in July.

Posted in JISC OpenBib | Tagged | Leave a comment

Informing public access to peer reviewed scholarly publications and data resulting from publicly funded research

The US government (OSTP) has recently issued an RFI on Open Access to data resulting from publicly funded research. The deadline for responding to the RFI has been extended to January 12.

http://www.whitehouse.gov/blog/2011/12/21/extended-deadline-public-access-and-digital-data-rfis

Detailed responses are openly available for collaborative editing and signing, here:

For digital data –
https://docs.google.com/document/d/1QA1eGBynqh-yN0bo3_nYzD3d26nEhvuVPMUR2ffi17o/edit?hl=en_US

For peer reviewed scholarly publications –
https://docs.google.com/document/d/1vEcWqAz6bwIIR6qQqWZYc8iUBrOpJ9NrvC9HiiQMc2Y/edit?hl=en_US

I request that anyone with an interest in scholarship responds in advocacy of open access, either via the above responses or directly to the OSTP. I justify my request not by re-iterating the virtues of open access (there are plenty), but by countering a common basis of closed access arguments: I propose that open access can be profitable, if profit is desired. (Although I admit that my motivation is for learning stuff, not making money.)

Consider the gold rush upon which expanding colonisation of America so successfully relied; why did people care about gold that much? why did they go to such great lengths to traverse the wilds and dig it up, risking or losing their lives in the process? Of course, the answer is because it was highly valued – and the reason it was so highly valued was precisely because it is so hard and risky to come by.

Controlling access to a resource is a common way to generate profit; because gold is inherently hard to access, it is a good basis for an economy. Similarly, all sorts of materials that are found to have desirable properties become valuable, usually as a function of their desirability in relation to accessibility.

Digital artefacts, however, are very easy to copy and distribute. In cases where an industry has grown up around the distribution of a product that has become digitally easy to copy and share, efforts have been made to artificially maintain that difficulty via the application of the concept of digital piracy.

If gold were easy to find, easy to copy, easy to distribute – would it help if we made it poisonous?

Encumbering digital artefacts with artificial accessibility restrictions does not make them hard to find, copy, or distribute – it just makes them needlessly complicated.

The traditional increase desirability / decrease accessibility paradigm does not readily apply to digital artefacts. Fortunately, the problem of profiting from them has been solved, and solved often; they are regularly purchased or consumed via profitable services on the basis of convenience or improved user experience. In such cases, open access to a resource facilitates building a useful (or at least desirable or fashionable) service – consider Google, Facebook, Youtube, Spotify.

The case of publicly funded scholarly output is further complicated by the fact that accessibility is inherent to desirability – the point is to build on what we learn, and we cannot do that if we cannot access it. Achieving anything with these artefacts – discovering, sharing, learning, communicating, archiving, profiting – is best done in the ideal environment where they are easy to find, copy, distribute – and are not poisonous.

We need open access, not restricted access.

Posted in JISC OpenBib, OKFN Openbiblio | Tagged | 1 Comment

Minutes: 17th Virtual Meeting of the OKFN Openbiblio Group

Date: January, 3rd 2011, 16:00 GMT

Channels: Meeting was held via Skype and Etherpad

Participants

  • Adrian Pohl
  • Peter Murray-Rust
  • Thad Guidry
  • Thomas Krichel (first half)
  • Karen Coyle (first half)
  • Jim Pitman (second half)

Agenda

OCLC’s FAST release

  • Is this open data according to the OKD? – Yes, see VoID description of the dataset
  • The licence is attached to the whole dataset and not to individual resources
  • Attribution probably is required on data set level too
  • Until now, the data isn’t fully OKD-compliant because no full dump of the data exists
  • Richard Cyganiak already created an entry on the Data hub for FAST data.

LCSH in Freebase

  • thad reports that all LCSH will be imported into freebase.
  • The facets will be separated out as well (as FAST does).
  • Timeline: six months to a year

Re-Using DBLP data

  • DBLP entry at the Data Hub: http://thedatahub.org/dataset/dblp
  • Jim had some ideas about re-using the DBLP data in BibSoup, e.g. for collecting publications on deduplication.
  • The isitopendata-enquiry is now resolved by Thomas
  • Mark could run DBLP data in a BibServer instance but he cannot maintain it besides the other BibServer instances he already maintains
  • Thad: DBLP data should be uploaded to ScraperWiki (1 GB maximum)
  • Thad: maybe create a “base” at Freebase for DBLP or BibSoup.
    • Freebase is fully versioned: edits can be reverted; queries can be run against older versions
    • Thad: Freebase is more a backend data store but doesn’t have a proper GUI. BibServer might play the GUI role.
  • ACTION: Mark helps Thad to set up a BibServer instance and Thad pushes the DBLP data into it

Writing and maintaining parsers etc. for BibServer

  • Jim can write preliminary parsers but he can’t maintain the code, write bugfixes etc.
  • Parsers are written in Python
  • He would like to have someone else to do it.
  • Thad proposes pushing the code to scraper wiki
  • ACTION: Jim and Thad will communicate about how to pack up the code, publish and maintain it.

Jim’s recent efforts

Openbiblio Sprint

  • 17-19th of January: openbiblio sprint session in Cambridge? (suggested by Mark)
  • Mark tries to get Etienne (new programmer from Amsterdam), Rufus, Ed Chamberlain (for some of the time), Naomi and Primavera together for a sprint session.

Action Collection

  • Mark, Thad: set up a BibServer instance and push DBLP data into it
  • Jim, Thad: communicate about how to pack up the code for parsers etc., how to publish and maintain it.
Posted in minutes, OKFN Openbiblio | Leave a comment

My First Hackathon

This Open Research Reports Hackathon was part of the Semantic Web Applications and Tools for Life Sciences conference. Approaching this from a non-Computer Science perspective, I had no idea what I was in for. Having understood the word ‘hack’ only as meaning an underhanded and illicit way of accessing protected information (apparently ‘crack’ is the correct term for this), it turns out that – in this and similar cases – ‘hack’ means finding elegant solutions to computing problem by sharing ideas and expertise. So, having learnt something new within the first five minutes, and with that as the underpinning aim, we set about sharing interests, problems and ideas, and began by introducing ourselves.

There were programmers, PhD students, open science enthusiasts, and the occasional person who had Life Sciences and / or software technology as a hobby. ‘Lightning talks’ by some attendees presented particular problems or suggestions, including the topics of semantics, data modelling and minimal standards. With breaks to facilitate discussions between people with shared interests, and rewards for enthusiastic networking (a system involving casino chips exchanged for beers), it was a productive evening for mingling and plotting and set the groundwork for day 2.

The next morning people arrived abuzz with fledgling ideas and enthusiasm for Making Things Happen. Groups of like-minded individuals joined forces around proposed subjects, set goals and began hacking… Whereupon developed a hushed and industrious atmosphere and the room became reverently productive. While the groups worked away, I joined Peter Murray-Rust and Mahendra Mahey to record some people talking about what they are up to in general as well as what they were doing at that moment in the Hackathon. This was very interesting as it provided a great explanation of what brings people to events such as this – predominantly, to see what would happen when people with different skills combine ideas to solve a realised problem.

The round-ups shared the groups’ progress, where a spokesperson from each group explained the starting-point or problem, how this was approached and what the outcome or solution was. There were interactive diagrams, existing resources combined and / or adapted, interesting tangents explored and proposals for further research suggested. They included great ideas and demonstrations such as how to build collections of relevant research article metadata quickly, filtering drug information for patients by side-effects and availability etc, and using natural language to populate forms designed for capturing metadata (follow link at the end of this post to read about the outcomes in more detail). Reports were received with enthusiasm and encouragement, and prompted cross-group collaboration and further exploration of emerging ideas even as we were being herded out of the door.

My overview of this, my first Hackathon, is that I thought it was an excellent way of developing ways of improving problems common to science and technology and fostering interdisciplinary collaboration. I was impressed that tangible solutions had been developed to the level of sophistication presented, and I am keen on the suggestion to revisit these solutions in 6 months or so to establish how the projects develop.

 

Many thanks to Mahendra (our welcome committee, compère, areas of interest match-maker and all-round organiser) for the smooth running of the event and the organisers / sponsors (DevCSI which is funded by JISC, SWAT4LS and the Open Knowledge Foundation) for enabling this exciting and productive event to go ahead.

More details on this event, including the lightning talks and round-ups, can be found at http://wiki.okfn.org/Working_Groups/Science/swat4ls_hackathon

For more information on future events please refer to http://okfn.org/events/

Posted in BibServer, JISC OpenBib | Tagged | Leave a comment

What is BibServer? What happened to Bibliographica?

Time for another clarification on what work we are doing, and what various different acronyms mean.

BibServer

BibServer is the software we are now working on; the aim of it is to provide a tool that enables individuals and small groups to quickly and easily share their collections.

Imagine you have a collection of metadata perhaps in a bibtex file, or in a spreadsheet (CSV) file, or from a reference management tool such as mendeley – or maybe even you do not have such a collection yet, but you know you need to create one. Well, BibServer would enable you to take that collection and build a search web page onto it, with nice features like faceted browsing and visualisations.

As opposed to reference management tools, the focus of BibServer is not on managing your collection; instead, we presume you have another way to do that – as most people do, such as your particular software, or a file on your local network where access is already controlled to particular staff members as required. Rather than duplicating all that effort, BibServer just functions at the point where you want to expose that collection.

BibServer is open source software. You can run your own. You can (soon) pay us to run a service for you. Or you can have a go on our example service.

If you want to make suggestions for new features, parsers or datasets, or offer to help in creating them, please see our software website and repo.

Bibsoup

Bibsoup is our name for the general aggregation of all bibliographic records floating around in the world. When they are pulled together into a particular collection that someone cares about, they form a small bibsoup. Our example BibServer service is up and running at http://bibsoup.net, and there you can see some of our example collections. You could also create your own!

Bibliographica

Bibliographica is the example service we set up during year 1 of JISC Open Bibliography. It demonstrates how to share a large collection such as the British Library British National Bibliography as Linked Open Data. It runs on the open source openbiblio software, which is available for anyone wishing to run such a service.

We will soon be porting the content of Bibliographica into a BibServer, to focus on maintaining the BibServer code base.

BibJSON

A quick note about BibJSON – our BibServer code works on indexes based on data in JSON format. This is in line with the aim of enabling people to quickly share their data online in a simple manner, as JSON is ubiquitous on the web and fits well with development of AJAX services. BibJSON is simply a way of describing what we expect to see in a particular JSON file, that allows us to easily share some attributes about collections and entries in collections. We are considering the value of further work on specifying BibJSON, and this will depend on feedback from the community.

Posted in BibServer, JISC OpenBib, News | Tagged | 2 Comments

New recruit

Hello from the newest team member, Naomi Lillie. I have recently joined the Open Knowledge Foundation to undertake OKFN coordination work as well as specifically to assist the JISC Open Biblio 2 project.

I will be supporting the project as Community Coordinator, working alongside the team to support the existing and emerging work, ensuring smooth running of the day-to-day administrative side of things, organising events and promoting the project to a wider audience. I’ll be running weekly catch-ups for the team to share information and report back.

I am not at all technical, so will be able to give the layman’s perspective of what’s going on in this exciting area of Open Data, and will be starting to blog about what’s going on as I get a feel for things.

If I can help with any enquiries please do not hesitate to get in touch on naomi [dot] lillie [at] okfn [dot] org.

Posted in JISC OpenBib | Tagged | Leave a comment

International Digital Curation Conference

Last week I attended the International Digital Curation Conference. It was very interesting to see many people from institutes across the world talking about how to manage and share all sorts of research data; from attempts to use institutional repository software designed for article content, to new builds from scratch, there were lots of examples.

A great value of the conference was that it was clear that people are looking for ways to make information more accessible; there is less emphasis on discussing the merits or otherwise of accessibility in itself, and more on getting it done.

Of course, all of these rely on storing data of some form or another, and also on managing metadata; a relevant area then for application of OKF tools such as CKAN and BibServer.

Further integration with services such as total-impact and figshare could also prove fruitful – although there is also risk in that there are now multiple sorts of tools that offer ways to “manage / curate / share” data that it may be unclear what they are all for – what is their USP, and why should we use X over Y? This is a question for CKAN and BibServer to answer as much as any of the alternatives.

Posted in BibServer, events, JISC OpenBib, OKFN Openbiblio | Tagged | Leave a comment

JISC Open Bibliography 2

JISC have put some extra funding in to continue work on Open Bibliography; hooray!

We are continuing along the open biblio collections theme from last year, but will now be focussing on ensuring individuals and small groups can do something useful with the records in all these collections. With any luck, we are over the obstacle of lack of data – now we need to solve the problem of making that data relevant in context.

From the software perspective, we are building tools to enable people to quickly and easily share a collection of records that are particularly relevant to them – perhaps a list of relevant reading for a course, or a list of publications of the members of a department, or a list of research articles relevant to research into a specific disease.

Further information about the project, and links to other resources, are available on the JISC Open Bibliography 2 project page

Posted in BibServer, JISC OpenBib, OKFN Openbiblio | Tagged | Leave a comment

European PSI Directive to be Expanded to Cover Memory Institutions

This morning Neelie Kroes, Vice-President of the European Commission and Commissioner for the Digital Agenda announced a new Open Data Strategy for Europe. Jonathan Gray has posted some more information including quotes on the OKF blog.

What is especially of interest regarding open bibliographic data is that the new Open Data Strategy for Europe includes some proposals by the EU commission for updating the Directive on the re-use of public sector information and expanding it to memory institutions. The press release says the commission proposes “massively expanding the reach of the Directive to include libraries, museums and archives for the first time; the existing 2003 rules will apply to data from such institutions.

More from the press release:

Brussels, 12 December 2011 – The Commission has launched an Open Data Strategy for Europe, which is expected to deliver a €40 billion boost to the EU’s economy each year. Europe’s public administrations are sitting on a goldmine of unrealised economic potential: the large volumes of information collected by numerous public authorities and services. Member States such as the United Kingdom and France are already demonstrating this value. The strategy to lift performance EU-wide is three-fold: firstly the Commission will lead by example, opening its vaults of information to the public for free through a new data portal. Secondly, a level playing field for open data across the EU will be established. Finally, these new measures are backed by the €100 million which will be granted in 2011-2013 to fund research into improved data-handling technologies.

These actions position the EU as the global leader in the re-use of public sector information. They will boost the thriving industry that turns raw data into the material that hundreds of millions of ICT users depend on, for example smart phone apps, such as maps, real-time traffic and weather information, price comparison tools and more. Other leading beneficiaries will include journalists and academics.

Commission Vice President Neelie Kroes said: “We are sending a strong signal to administrations today. Your data is worth more if you give it away. So start releasing it now: use this framework to join the other smart leaders who are already gaining from embracing open data. Taxpayers have already paid for this information, the least we can do is give it back to those who want to use it in new ways that help people and create jobs and growth.” See Mrs Kroes video quote here.

The Commission proposes to update the 2003 Directive on the re-use of public sector information by:

  • Making it a general rule that all documents made accessible by public sector bodies can be re-used for any purpose, commercial or non-commercial, unless protected by third party copyright;
  • Establishing the principle that public bodies should not be allowed to charge more than costs triggered by the individual request for data (marginal costs); in practice this means most data will be offered for free or virtually for free, unless duly justified.
  • Making it compulsory to provide data in commonly-used, machine-readable formats, to ensure data can be effectively re-used.
  • Introducing regulatory oversight to enforce these principles;
  • Massively expanding the reach of the Directive to include libraries, museums and archives for the first time; the existing 2003 rules will apply to data from such institutions.

In addition, the Commission will make its own data public through a new “data portal”, for which the Commission has already agreed the contract. This portal is currently in ‘beta version’ (development and testing phase) with an expected launch in spring 2012. In time this will serve as a single-access point for re-usable data from all EU institutions, bodies and agencies and national authorities.

Posted in Uncategorized | Leave a comment

DBLP releases its 1.8 million bibliographic records as open data

The following guest post is by Marcel R. Ackermann who works at the Schloss Dagstuhl – Leibniz Center for Informatics on expanding the DBLP computer science bibliography.

Computer Science literature

Right from the early days of the DBLP, the decision has been made to make its whole data set publically available. Yet, only at the age of 18 years, DBLP adopted an open-data license.

The DBLP computer science bibliography provides access to the metadata of over 1.8 million publications, written by over 1 million authors in several thousands of journals or conference proceedings series. It is a helpful tool in the daily work of researchers and computer science enthusiasts from around the world. Although DBLP started with a focus on database systems and logic programming (hence the acronym), it has grown to cover all disciplines of computer science.

The success of DBLP wasn’t planned. In 1993, Michael Ley from the University of Trier, Germany, started a simple webserver to play around with this so-called “world wide web” everybody was so excited about in these days. He chose to set up some webpages listing the table of contents of recent conference proceedings and journal issues, some other pages listing the articles of individual authors, and provided hyperlinks back and forth between these pages. People from the computer science community found this quite useful, so he just kept adding papers. Funds were raised to hire helpers, some new technologies were implemented, and the data set grew over the years.

The approach of of DBLP has always been a pragmatic one. So it wasn’t until the recent evolution of DBLP into a joint project of the University of Trier and Schloss Dagstuhl – Leibniz Center for Informatics that the idea of finding a licensing model came to our minds. In this process, we found the source material and the commentaries provided by the Open Knowledge Foundation quite helpful. We quickly concluded that either the PDDL or the ODC-by license would be the right choice for us. In the end, we choose ODC-by since, as researchers ourself, it is our understanding that external sources should be referenced. Although from a pragmatic point of view, nothing has changed at all for DBLP (since permissions to use, copy, redistribute and modify had been generally granted before) we hope that this will help to clarify the legal status of the DBLP data set.


For additional information about access to and technical details of the dataset see the corresponding entry on the Data Hub.


Credits: Photo licensed CC-BY-SA by Flickr user Unhindered by Talent.

Posted in Data, guest post, licensing | Leave a comment