Wednesday 18th January – Open Biblio Sprint: Day 2

BibServer took precedence this morning, with Etienne, Ed and Mark continuing to develop the BibServer parsers… By March we want people to be able to download and run their own instance of the Server, or to provide a service whereby we can do it for them. We discussed various use cases that could be used to explain how BibServer is valuable in data collection – for example: a departmental administrator / researcher is required to provide a list of publications of those within his / her department using Symplectic / Web of Science / Endnote, and is to upload this information to the department’s website. BibSoup is ideal for this scenario as it allows different formats to be entered or produced, and the resulting collection can be easily searched and embedded in other web pages.

In the time we had been exploring the benefits of the BibServer, some e-mails had come through to the List with examples of collections. Starting with these, we identified a total of six people / groups who would drive open resources (BibSoup / BibServer) and whose data could be used as demonstrations (interpreted as Reading or Publication List format):

  • Malaria – Tom Olijhoek (Medline)
  • Sancoma – Gilles (Medline)
  • Karol Langner – personal libraries of people
  • UCC-PMR
  • Physics.cam.ac
  • Jim Pitman

It was agreed that these would be parsed through BibServer and used as examples of the functionality and importance of BibServer. Some, such as Jim’s probability web, may also benefit from dedicated BibServer instances.

Peter and I then interviewed members of the team, to record what was going on and talk about the project in general; this became short video blogs which will be available shortly.

Posted in BibServer, event, JISC OpenBib | Tagged | 3 Comments

Tuesday 17th January – Open Biblio Sprint: Day 1

Today Etienne, Ed, Rufus, Mark, Peter and I met up to start the first sprint of the new year. We began by clarifying the purpose of the sprint, today’s agenda and the project overall. We stated our aims as follows: we are not trying to re-do what is already available online, we are not getting into the detail of normalisation or disambiguation within a centralised database, and we are not intending to alter the academic culture overnight; however, we are going to improve the BibJSON facility for wider use, we are trying to determine how we can get more small groups and individuals involved, and we are identifying compelling, essential and simple reasons for people to support the project at this early stage before the ultimate global benefits can be realised. With this in mind, we got cracking.

Etienne, the newest member of the team, began coding pretty much straight away – he and Ed started working on MARC / RIS parsers. Peter and I started a huge list of FAQs for the website – Peter asking in-depth questions such as ‘do we want to create a single BibJSON collection for all the world’s metadata?’, me going for slightly less detail with ‘what is metadata?’ – and Mark assisted where he was needed. Peter hacked some datasets to go into the parsers and Mark got some coding done too. There was good progress made and we are set up well for tomorrow to crunch some issues.

One problem to be revisited is in relation to BibJSON, in having copies of the same record within different collections. If an object (the record) is held within multiple collections, there are separate copies of that object which could cause problems – for example, if a record is copied into several collections and then a typo is found, it can be a mammoth task trying to track down all erroneous copies and correct them… This issue is likely to be solved by creating Master / Slave relationship between copies. Also with regards to BibJSON, Etienne suggested providing a flat HTML version of collections (in addition to the javascript option), for easy use in departmental web pages.

All set for day 2…

Posted in BibServer, event, JISC OpenBib | Tagged | 1 Comment

BibServer Code Sprint January 2011

BibServer team will be having a code sprint this week.

  • When: Tuesday 17
  • Where: Cambridge, UK (and online)

More details to follow.

Posted in BibServer, JISC OpenBib, News | Tagged | 1 Comment

Sweden Ends Negotiations with OCLC

The following guest post is by Maria Kadesjö who works at the Libris-department at the National Library of Sweden.

The national library of Sweden has ended negotiations with OCLC on participation in WorldCat, as the parties could not come to an agreement. The negotiations go back to 2006 and the key obstacle for the national library has been the record use policy. Some time into the negotiations OCLC presented certain conditions for how records taken from WorldCat for cataloguing were to be used in Libris.

A No to WorldCat Rights and Responsibilites

The question has its base in the “WorldCat Rights and Responsibilities for the OCLC Cooperative” where you as an OCLC member have to accept certain conditions and the aim is to support the ongoing and long-term viability and utility of WorldCat (and its services). The National Library cannot accept the conditions as they are today since WorldCat is not the only arena in which The National Library wants and needs to be active. Accepting the conditions would mean that we would forever have to relate to OCLC’s policy.

Libris an open database

Libris is the Swedish union catalogue with (with some 170 libraries, primarily academic libraries) and is built quite similar to WorldCat but on a smaller scale. Member libraries catalogue their records in Libris and the records are then exported to their local library systems. The Libris-cooperative is built on voluntarily participation. Any Libris-library should be able to, whenever they want, take out all their bibliographic records from Libris and use them in another library system. The National library makes no claim to the records and do not control how the libraries chose to use their bibliographic records.

The National Library has taken the decision to release the national bibliography and authority data as open data. The reason for this is to acknowledge the importance of open data and the importance of libraries’ control of their data when it comes to the long term sustainability and competition of the services needed by libraries and their users.

In the Agreement on Participation in the Libris joint catalogue signed by the National Library and cataloguing libraries, Point 3.3 specifies that “the content in Libris is owned by the National Library and is freely accessible in accordance with the precepts and methods reported by the National Library, both for Participating Libraries and for external partners.” This paragraph is needed so that the National library can sign agreements for Libris with other partners (OCLC for example) but also so that the National Library can abstain from claims of ownership of bibliographic records taken from Libris. Libris can therefore be an open database, both for the libraries that use Libris for cataloguing and for others.

Not consistent with Libris principles

The consequences of signing with OCLC would be that the National Library would have to supervise how records originating in WorldCat were used. And a library that took a bibliographic record from WorldCat for cataloguing in Libris and exported it to its own system would have to accept OCLC´s term of use.

A library that wished to leave Libris would not obviously be able to do this, since it is not self-evident that the bibliographic records could be integrated into other systems. This would be an infringement in the voluntarily participation that characterizes Libris. In practice, the National Library has no mandate that restricts the freedom of action of Libris’ libraries in this way, since the National Library has no possibility of influencing how the Libris libraries themselves choose to use their catalogues.

Posted in guest post, vendors | Leave a comment

JISC Discovery 2012

Today (11th jan 2012) I am attending the JISC Discovery 2012 meeting to learn about the JISC projects aiming to increase open access to research materials. Here are some notes I took during the day:

Joy Palmer – Mimas – Copac

Search for records relevant in some way to each other, visualise and export as MODS or CSV. Would it be worth exporting (bib)JSON? Would they be interested in that?

Christine Madsen – Bodleian Libraries

They have collections they want to make more usable to machines and people, and to get rid of silos e.g. allow the content to be aggregated by others – Europeana etc. They will provide some metadata, OAI, Open Linked Data, via APIs. Using the commercial iNQUIRE interface from Armadillo Systems for people to search the repo. (iNQUIRE runs over a SOLR index, similar to project blacklight – but is it open source?)

Me – JISC Open Biblio 2

We are working on building interfaces for people to quickly share their collections in useful ways – how can we make sure it is easy to share and consume this metadata?

John Gilby – M25 Consortium

Reusing Copac records and comparing with M25 libraries to build higher quality bibliographic metadata – using open source. Developing an API for embedding in resource discovery systems. practical guidelines for open metadata principles. Should show them our principles

Contextual Wrappers

Using collection descriptions to help people find relevant collections. Using Culture Grid for metadata. Working with people like University Museums in Scotland to share collection descriptions.

Eric Cross, Stephen McGough – Newcastle University – The Cutting Edge

Bringing together archaeological and ethnographic museum objects – sharp-edged objects such as tools, axes etc as the focus. For performing use-wear analysis on the objects. Building a comprehensive metadata internet collection about such objects. Spanning multiple collections and searching across them using SPARQL queries, presenting a single front end.

Leif Isaksen – Southampton – Pelagios 2

Moved from JISC Geo – enabling linked ancient Geo data in open systems, focussing on lightweight annotation approach. Looking for connections between documents that are related by place, mapping and visualising the connections. Building a cataloguing, search, visualisation service and community toolkit.

Step change

Building systems to generate linked data from archivist workflows, so they do not need to care about RDF themselves. Analyses descriptions against OpenCalais. Aiming to connect it to Calm, so that archivists can locally generate linked data and push it to their Calm UIs. Similarly doing this with Historypin – who generate UIs to show interesting historical things near a certain geographic area. Working in context with Cumbria archive service. launching an API via UKAT.

Edina Geotagger, MediaHub and SUNCAT

Providing a webs ervice around Exif|Tool, to geotag / geocode image, audio, and video metadata. MediaHub provides images, video and audio licensed by JISC collections and harvested from other providers. can parse records and identify probably people, places and dates – then suggest values to users. Hoping to generate better participation by just asking users “is this a date” and so on, rather than them having to type into a form. SUNCAT aggregates journal holdings info from 82 UK libraries, as MarcXML and linked data. Aiming to increase the amount of metadata that is openly available. Working to establish use cases for formats such as MODS / DC, and exploring the licensing status of RDF triples.

ServiceCORE

provides a jquery plugin for an institutional repo that shows relevant information about similar entries in other repos elsewhere. Aiming to build a web service layer providing programmable access to aggregated content and metadata in institutional repos. Also a pilot tool for automatic subject-based classification of content using text categorisation techniques. overall, providing an enhanced related resource discovery system based on text mining. Hoping to offer a way for people to find versions of papers that they can access, as opposed to ones they cannot. Building on OAI-PMH compliant repos.

Clock – Lincoln and Cambridge

This is a continuation of Jerome. Enriched bib data from Jerome, Comet and elsewhere. A set of dev APIs and linked data endpoints. Aiming to establish a distributed scholarly catalogue for the UK. Planning to work closely with JISC open bib.

LUNCH!

Afternoon session – grouping up around particular themes

  • RDF
  • sharing data
  • authorities/indexes/access points
  • User interfaces
  • Collection descriptions across A + H

I joined the sharing data group. Dicsussion turned out to be quite brief, and just covered “what is the problem of sharing data”? There was no time to come up with a response.

Studies in Discovery

The Discovery ecology is aiming to explain why institutions should do it / take part in it. It will do this by writing up case studies. Doing the 12 criteria of Discovery will mean that you are “doing” discovery:

  • adopting open licensing
  • requiring clear reasonable terms and conditions
  • using easily understood data models
  • deploying persistent identifiers
  • establishing data relationships by re-using authoritative identifiers
  • providing clear mechanisms for accessing APIs
  • documenting APIs
  • adopting widely understood data formats
  • ensuring data is sustainable
  • ensuring services are supported
  • using your own APIs
  • collecting data to measure use

Business case

How is our project to be sustainable? What will maintain it? How will it survive long term, beyond JISC funding? Need to provide feedback to JISC Discovery about what suitable business cases are.

Group discussions

We considered how the 12 studies listed above had already been or would be considered by our projects. In some cases, some of the study topics were not relevant, but on the whole they are useful. We will keep these topics in mind whilst writing up project blog posts, and may well write specific posts about those topics.

Of particular interest to me was the “adopting widely understood data formats” study – because this goes beyond the scope of any one project, but is also something that would be of benefit. However, whether or not it is of benefit depends on whether or not any two or more people / groups decides it is of benefit… I will follow up to the discovery mailing list with information about our current thinking on bibjson, with details about our parsers API (once I have finished it), and links to the metadata guide that has been created by Primavera and others.

That is the end. I will follow up with some projects after today, and also we are already meeting up with CLOCK Cambridge and Lincoln. Next Discovery meeting will be in April, then in July.

Posted in JISC OpenBib | Tagged | Leave a comment

Informing public access to peer reviewed scholarly publications and data resulting from publicly funded research

The US government (OSTP) has recently issued an RFI on Open Access to data resulting from publicly funded research. The deadline for responding to the RFI has been extended to January 12.

http://www.whitehouse.gov/blog/2011/12/21/extended-deadline-public-access-and-digital-data-rfis

Detailed responses are openly available for collaborative editing and signing, here:

For digital data – https://docs.google.com/document/d/1QA1eGBynqh-yN0bo3_nYzD3d26nEhvuVPMUR2ffi17o/edit?hl=en_US

For peer reviewed scholarly publications – https://docs.google.com/document/d/1vEcWqAz6bwIIR6qQqWZYc8iUBrOpJ9NrvC9HiiQMc2Y/edit?hl=en_US

I request that anyone with an interest in scholarship responds in advocacy of open access, either via the above responses or directly to the OSTP. I justify my request not by re-iterating the virtues of open access (there are plenty), but by countering a common basis of closed access arguments: I propose that open access can be profitable, if profit is desired. (Although I admit that my motivation is for learning stuff, not making money.)

Consider the gold rush upon which expanding colonisation of America so successfully relied; why did people care about gold that much? why did they go to such great lengths to traverse the wilds and dig it up, risking or losing their lives in the process? Of course, the answer is because it was highly valued – and the reason it was so highly valued was precisely because it is so hard and risky to come by.

Controlling access to a resource is a common way to generate profit; because gold is inherently hard to access, it is a good basis for an economy. Similarly, all sorts of materials that are found to have desirable properties become valuable, usually as a function of their desirability in relation to accessibility.

Digital artefacts, however, are very easy to copy and distribute. In cases where an industry has grown up around the distribution of a product that has become digitally easy to copy and share, efforts have been made to artificially maintain that difficulty via the application of the concept of digital piracy.

If gold were easy to find, easy to copy, easy to distribute – would it help if we made it poisonous?

Encumbering digital artefacts with artificial accessibility restrictions does not make them hard to find, copy, or distribute – it just makes them needlessly complicated.

The traditional increase desirability / decrease accessibility paradigm does not readily apply to digital artefacts. Fortunately, the problem of profiting from them has been solved, and solved often; they are regularly purchased or consumed via profitable services on the basis of convenience or improved user experience. In such cases, open access to a resource facilitates building a useful (or at least desirable or fashionable) service – consider Google, Facebook, Youtube, Spotify.

The case of publicly funded scholarly output is further complicated by the fact that accessibility is inherent to desirability – the point is to build on what we learn, and we cannot do that if we cannot access it. Achieving anything with these artefacts – discovering, sharing, learning, communicating, archiving, profiting – is best done in the ideal environment where they are easy to find, copy, distribute – and are not poisonous.

We need open access, not restricted access.

Posted in JISC OpenBib, OKFN Openbiblio | Tagged | 1 Comment

Minutes: 17th Virtual Meeting of the OKFN Openbiblio Group

Date: January, 3rd 2011, 16:00 GMT

Channels: Meeting was held via Skype and Etherpad

Participants

  • Adrian Pohl
  • Peter Murray-Rust
  • Thad Guidry
  • Thomas Krichel (first half)
  • Karen Coyle (first half)
  • Jim Pitman (second half)

Agenda

OCLC’s FAST release

  • Is this open data according to the OKD? – Yes, see VoID description of the dataset
  • The licence is attached to the whole dataset and not to individual resources
  • Attribution probably is required on data set level too
  • Until now, the data isn’t fully OKD-compliant because no full dump of the data exists
  • Richard Cyganiak already created an entry on the Data hub for FAST data.

LCSH in Freebase

  • thad reports that all LCSH will be imported into freebase.
  • The facets will be separated out as well (as FAST does).
  • Timeline: six months to a year

Re-Using DBLP data

  • DBLP entry at the Data Hub: http://thedatahub.org/dataset/dblp
  • Jim had some ideas about re-using the DBLP data in BibSoup, e.g. for collecting publications on deduplication.
  • The isitopendata-enquiry is now resolved by Thomas
  • Mark could run DBLP data in a BibServer instance but he cannot maintain it besides the other BibServer instances he already maintains
  • Thad: DBLP data should be uploaded to ScraperWiki (1 GB maximum)
  • Thad: maybe create a “base” at Freebase for DBLP or BibSoup.
    • Freebase is fully versioned: edits can be reverted; queries can be run against older versions
    • Thad: Freebase is more a backend data store but doesn’t have a proper GUI. BibServer might play the GUI role.
  • ACTION: Mark helps Thad to set up a BibServer instance and Thad pushes the DBLP data into it

Writing and maintaining parsers etc. for BibServer

  • Jim can write preliminary parsers but he can’t maintain the code, write bugfixes etc.
  • Parsers are written in Python
  • He would like to have someone else to do it.
  • Thad proposes pushing the code to scraper wiki
  • ACTION: Jim and Thad will communicate about how to pack up the code, publish and maintain it.

Jim’s recent efforts

Openbiblio Sprint

  • 17-19th of January: openbiblio sprint session in Cambridge? (suggested by Mark)
  • Mark tries to get Etienne (new programmer from Amsterdam), Rufus, Ed Chamberlain (for some of the time), Naomi and Primavera together for a sprint session.

Action Collection

  • Mark, Thad: set up a BibServer instance and push DBLP data into it
  • Jim, Thad: communicate about how to pack up the code for parsers etc., how to publish and maintain it.

Posted in minutes, OKFN Openbiblio | Leave a comment

My First Hackathon

This Open Research Reports Hackathon was part of the Semantic Web Applications and Tools for Life Sciences conference. Approaching this from a non-Computer Science perspective, I had no idea what I was in for. Having understood the word ‘hack’ only as meaning an underhanded and illicit way of accessing protected information (apparently ‘crack’ is the correct term for this), it turns out that – in this and similar cases – ‘hack’ means finding elegant solutions to computing problem by sharing ideas and expertise. So, having learnt something new within the first five minutes, and with that as the underpinning aim, we set about sharing interests, problems and ideas, and began by introducing ourselves.

There were programmers, PhD students, open science enthusiasts, and the occasional person who had Life Sciences and / or software technology as a hobby. ‘Lightning talks’ by some attendees presented particular problems or suggestions, including the topics of semantics, data modelling and minimal standards. With breaks to facilitate discussions between people with shared interests, and rewards for enthusiastic networking (a system involving casino chips exchanged for beers), it was a productive evening for mingling and plotting and set the groundwork for day 2.

The next morning people arrived abuzz with fledgling ideas and enthusiasm for Making Things Happen. Groups of like-minded individuals joined forces around proposed subjects, set goals and began hacking… Whereupon developed a hushed and industrious atmosphere and the room became reverently productive. While the groups worked away, I joined Peter Murray-Rust and Mahendra Mahey to record some people talking about what they are up to in general as well as what they were doing at that moment in the Hackathon. This was very interesting as it provided a great explanation of what brings people to events such as this – predominantly, to see what would happen when people with different skills combine ideas to solve a realised problem.

The round-ups shared the groups’ progress, where a spokesperson from each group explained the starting-point or problem, how this was approached and what the outcome or solution was. There were interactive diagrams, existing resources combined and / or adapted, interesting tangents explored and proposals for further research suggested. They included great ideas and demonstrations such as how to build collections of relevant research article metadata quickly, filtering drug information for patients by side-effects and availability etc, and using natural language to populate forms designed for capturing metadata (follow link at the end of this post to read about the outcomes in more detail). Reports were received with enthusiasm and encouragement, and prompted cross-group collaboration and further exploration of emerging ideas even as we were being herded out of the door.

My overview of this, my first Hackathon, is that I thought it was an excellent way of developing ways of improving problems common to science and technology and fostering interdisciplinary collaboration. I was impressed that tangible solutions had been developed to the level of sophistication presented, and I am keen on the suggestion to revisit these solutions in 6 months or so to establish how the projects develop.

 

Many thanks to Mahendra (our welcome committee, compère, areas of interest match-maker and all-round organiser) for the smooth running of the event and the organisers / sponsors (DevCSI which is funded by JISC, SWAT4LS and the Open Knowledge Foundation) for enabling this exciting and productive event to go ahead.

More details on this event, including the lightning talks and round-ups, can be found at http://wiki.okfn.org/Working_Groups/Science/swat4ls_hackathon

For more information on future events please refer to http://okfn.org/events/

Posted in BibServer, JISC OpenBib | Tagged | Leave a comment

What is BibServer? What happened to Bibliographica?

Time for another clarification on what work we are doing, and what various different acronyms mean.

BibServer

BibServer is the software we are now working on; the aim of it is to provide a tool that enables individuals and small groups to quickly and easily share their collections.

Imagine you have a collection of metadata perhaps in a bibtex file, or in a spreadsheet (CSV) file, or from a reference management tool such as mendeley – or maybe even you do not have such a collection yet, but you know you need to create one. Well, BibServer would enable you to take that collection and build a search web page onto it, with nice features like faceted browsing and visualisations.

As opposed to reference management tools, the focus of BibServer is not on managing your collection; instead, we presume you have another way to do that – as most people do, such as your particular software, or a file on your local network where access is already controlled to particular staff members as required. Rather than duplicating all that effort, BibServer just functions at the point where you want to expose that collection.

BibServer is open source software. You can run your own. You can (soon) pay us to run a service for you. Or you can have a go on our example service.

If you want to make suggestions for new features, parsers or datasets, or offer to help in creating them, please see our software website and repo.

Bibsoup

Bibsoup is our name for the general aggregation of all bibliographic records floating around in the world. When they are pulled together into a particular collection that someone cares about, they form a small bibsoup. Our example BibServer service is up and running at http://bibsoup.net, and there you can see some of our example collections. You could also create your own!

Bibliographica

Bibliographica is the example service we set up during year 1 of JISC Open Bibliography. It demonstrates how to share a large collection such as the British Library British National Bibliography as Linked Open Data. It runs on the open source openbiblio software, which is available for anyone wishing to run such a service.

We will soon be porting the content of Bibliographica into a BibServer, to focus on maintaining the BibServer code base.

BibJSON

A quick note about BibJSON – our BibServer code works on indexes based on data in JSON format. This is in line with the aim of enabling people to quickly share their data online in a simple manner, as JSON is ubiquitous on the web and fits well with development of AJAX services. BibJSON is simply a way of describing what we expect to see in a particular JSON file, that allows us to easily share some attributes about collections and entries in collections. We are considering the value of further work on specifying BibJSON, and this will depend on feedback from the community.

Posted in BibServer, JISC OpenBib, News | Tagged | Leave a comment

New recruit

Hello from the newest team member, Naomi Lillie. I have recently joined the Open Knowledge Foundation to undertake OKFN coordination work as well as specifically to assist the JISC Open Biblio 2 project.

I will be supporting the project as Community Coordinator, working alongside the team to support the existing and emerging work, ensuring smooth running of the day-to-day administrative side of things, organising events and promoting the project to a wider audience. I’ll be running weekly catch-ups for the team to share information and report back.

I am not at all technical, so will be able to give the layman’s perspective of what’s going on in this exciting area of Open Data, and will be starting to blog about what’s going on as I get a feel for things.

If I can help with any enquiries please do not hesitate to get in touch on naomi [dot] lillie [at] okfn [dot] org.

Posted in JISC OpenBib | Tagged | Leave a comment