Day 2 of the March Sprint

Today started well: Berkeley and PubMed contacted us about running a BibServer, which is great! It was also a day of comings and goings: Etienne arrived to code with Mark, Richard Jones of Cottage Labs dropped by to play around with parsers, and Thomas headed off after exploring the benefits of JSON-LD and BibJSON.

Etienne and Mark have been developing BibServer, merging facet view changes with existing software in order to present new functionality and provide a better user experience for the creation and indexing of data. This is expected to be completed tomorrow after some testing. Also available tomorrow will be an update from Thomas, who has taken data from BibSoup and put it into 3lib; he has also been at polling AuthorClaim for author information and looking at linking the metadata with BibSoup records. Mahendra and I got Sam / OpenGLAM involved with the Hackathon we’re planning for 12th-14th June and got thinking of London-based venues… suggestions welcome!

The schedule for tomorrow is to discuss interaction with other OKFN projects including TEXTUS, the Open Data Handbook and Public Domain Works, as well as testing BibServer’s new functionality.

In other news, Sam and Laura were busy writing their Lightning Talks for tonight’s Meet-Up… More on that later!

Posted in event, JISC OpenBib, OKFN Openbiblio | Tagged , , , , , , | Leave a comment

Installing BibServer from the repo on Mac OSX

The following guest post is by Edmund Chamberlain who works at Cambridge Unviversity Library.

As part of my work on the Open Bibliography project, I wanted to test how easy it would be for an average Systems Librarian such as myself to get BibServer up and running.

Turns out, it was pretty simple, for a development environment at least. The latest install docs can be found at readthedocs.org and contain pointers to all the required packages and dependencies; see here for install instructions.

###Python and dependencies

I started almost from scratch with a new Macbook Air running OSX Lion. The first thing I needed was the latest binaries for Python, the language BibServer and most OKFN projects are coded in. Python is installed on OSX by default but for good measure, I installed XCode 4 for free from the Apple App store. Advice on getting Python onto your favoured *nix OS or even Windows can be found on the main Python site.

According to the BibServer docs, a few additional dependencies were required, specifically PIP (one of several Python package manager options) and Virtual Env (a means to create multiple separate Python environments). Some great instructions on doing this can be found here.

Finally, I needed GIT version control software. Instructions for getting GIT onto OSX can be found along with a dedicated installer. If you are not familiar with GIT, here is a great introduction.

###ElasticSearch

Next up, I needed to install the indexing service underpinning BibServer, ElasticSearch. Having spent days grappling with various indexing solutions and document / graph based databases in the past, his was the part I was most hesitant about. Turns out, it really was as simple as the instructions stated.

1) Download the latest version into an appropriate place, extract files and simply start it.

2) Start ElasticSearch with:

$sudo bin/elasticsearch

3) Elastic Search is built with and runs on top of of Java. If you don’t have this installed OSX Lion will prompt you to download and install the latest version.

4) The install instructions give some tips on setting it up as a service.

###BibServer

With GIT and VirtualEnv installed, BibServer can be pulled and set up relatively quickly.

1) Create and start a virtual environment where {myenv} is the filepath of the environment:

virtualenv {myenv}

. {myenv}/bin/activate

2) Using GIT, clone the BibServer source code into that environment:

mkdir {myenv}/src

sudo git clone https://github.com/okfn/bibserver {myenv}/src/

3) Run a development install using Pip:

cd {myenv}/src/bibserver/bibserver

sudo pip install -e .

###Running it!

1) Ensure ElasticSearch is running.

2) Start Bibserver up:

sudo python {myenv}/src/bibserver/bibserver/web.py

3) Point your favoured web browser at:

localhost::5000

4) Upload a sample CSV file.

BibServer can be easily run a a background process using screen or some other suitable tool.

Posted in BibServer, guest post, JISC OpenBib | Tagged , , , , , , | Leave a comment

Day 1 of the March Sprint

Agendas are funny things; you have an idea of what you want to do, you write a few bullet-points to focus it a little and you presume things will naturally lead on from one thing to the next… Well, not this week. Barely had we settled down to the intros when the agenda was out the window!

As well as the Usual Suspects (Mark, me) and the Collaborators (Sam, Laura) we welcomed Thomas Krichel, an expert in scholarly communication over from the States, and were joined by two additional OKFNers, Jilly Matthews and Will Waites, who are based in Edinburgh and popped by to catch up on the project’s recent developments.

To begin with, all of us set our minds to the collaborative opportunities with CKAN as Jilly explained the project and the difference between thedatahub.org and CKAN (the former is a publicly available instance of the technology of the latter, which drives this and other instances). Then we split up into groups:

  • Thomas and Mark explored connecting BibSoup data with AuthorClaim and refined some ideas for the future of BibJSON, with Will (who was involved with previous iterations of the Open Biblio project/s) and Jilly contributing to discussions around simple / complex JSON following on from Mark’s post;
  • Mahendra and I finalised the details of tomorrow’s Meet-up;
  • Laura and Sam ducked in to various conversations, suggesting improvements in technology and running events as key phrases caught their ears… Sam was looking at Open Biblio’s overlap with OpenGLAM and Laura was advising on tomorrow’s event, having arranged several Cambridge Meet-ups before.

The plan for tomorrow is for Mahendra, Laura and I to plan the June Hackathon and for Mark to get some good coding done with Etiennne… but we’ll see how the agenda shifts!

Posted in event, JISC OpenBib, OKFN Openbiblio | Tagged , , , , , , | Leave a comment

Minutes: 19th Virtual Meeting of the OKFN Openbiblio Group

Date: March, 6th 2011, 16:00 GMT

Channels: Meeting was held via Skype and Etherpad

Participants

  • Adrian Pohl
  • Sebastian Nordhoff
  • Jim Pitman

Agenda

Action points from the last meeting

  • Adrian has posted to the KIM-DINI-LLD list regarding the national bibliography and a concersion to BibJSON. There are no answers yet.
  • Until now Mark & Adrian haven’t looked at the data themselves and whether/how it can be converted to BibJSON.

ACTION: Adrian will personally ask people from the German National Library at BibCamp.

Langdoc dataset

Sebastian gave us some information about the bibliographic dataset for the world’s lesser known languages recently published at http://glottolog.livingsources.org/. (See also the thread on the mailing list.)

Data

  • The dataset contains 180k references with 120K records tagged by language.
  • Sebastian sees a possibility (and need?) for crowdsourcing to add missing data elements for 40K records.
  • The data can be downloaded under http://glottolog.livingsources.org/meta/downloads.
  • In addition to bibliographic data the website contains a “comprehensive catalogue of the world’s languages, language families and dialects (langoids)” which can be searched and browsed starting here.
  • Example bibliographic record.
  • Existing problems with the data include
    • several ad hoc BibText properties
    • 73 different data fields
    • Sometimes semantics of a property aren’t known, e.g. document_type={B}
    • These problems are due to heterogeneous data sources for this dataset
    • RDF for references still in need of improvement

Code

  • There’s machine-learning code for automatically recognizing the language of a work involved
  • written in Python
  • Isn’t open yet
  • The code is created by Harald Hammarström.
  • NLTK, Natural Language Toolkit: http://www.nltk.org/

licensing issues

  • How many parts of the data set make problems? – “Africa part”, “Australia part”
  • What share of the data is problematic? – Approx. one quarter.

What’s Sebastian up to with the data set?

  • SN is happy to provide data and help other people start with it.
  • SN can probably convert existing data in BibJSON
  • SN will host a SPARQL endpoint
  • SN will not host a bibserver
  • SN will not develop bibserver

ACTION: Sebastian will provide a post for openbiblio.net when the data set is officially released (sometimes in March).

Provenance

A short discussion about provenance – in regard to BibJSON – arose.

ACTION: Ask members of these groups to provide a short post about it on openbiblio.net.

Resource and tasks of openbiblio activities at OKF

  • Jim had some questions about governance and allocation of OKF resources to the working group’s activities
    • How to organize WGOBD to engage contributors to creation/maintenance of various listings relevant to OBD?
    • How to keep the enquiries to potential open data providers going?

ACTION: Write down core resources and tasks of the openbiblio group. (Adrian)

Posted in minutes, OKFN Openbiblio, Uncategorized | Leave a comment

March Sprint and Meet-up

There will be a coding and planning sprint for the project team in Edinburgh on Monday 12th and Tuesday 13th March, with tying-up of loose ends on Wednesday 14th for those still around.

Following on from the productivity of January’s sprint, we aim to update project documentation, code and refine development, explore integration with other projects, plan for the remaining three months including demonstrations and user engagement, etc.

We will be joined by representatives of other projects including Textus, the School of Open Data and DevCSI.

If anyone is interested in seeing what we’re up to, or talking open data / knowledge in general, come along on the Tuesday evening as we have arranged a Meet-up with others from OKFN and DevCSI, and all are welcome – more details here. This promises to be a great opportunity for some Edinburgh-based folk (and anyone willing to travel!) to get together to discuss ideas, projects and generally set the world to rights over a brew.

For more information contact naomi.lillie [@] okfn.org.

Twitter: #OpenDataEDB

Posted in event, JISC OpenBib, OKFN Openbiblio | Tagged , , , , | Leave a comment

Linked Open Data as explained by Europeana

Antoine Isaac recently sent an e-mail around the List to let us know that Europeana has published its first dataset, comprising 2.4 million objects, under CC0. Furthermore, the new Data Exchange Agreement, which data suppliers are required to sign in order to publish on Europeana (and already signed by national libraries, national museums and content providers for entire countries), comes into effect on 1 July 2012, after which all metadata in Europeana will be available as Open Data to the Public Domain!

This is brilliant news in itself, but what I found particularly enchanting was the animated video that Europeana created in support of this announcement:

Linked Open Data from europeana on Vimeo.

As mentioned before, I am not of a technical background, and sometimes terms and explanations are difficult for me to grasp; however – I understand this video! I think it’s a brilliant explanation of the detail of Linked Open Data (LOD): how metadata works together, why it’s important that it’s open, how the more open data is available the more can be done with it, etc. This provides great clarity on what it is we’re seeking to do, for those of us who can’t tell a gig from a meg. The Open Biblio readership is generally more savvy with the nitty-gritty workings of LOD than I – if not in fact doing this sort of stuff already – but how can you not love the simplicity of this video?! It’s engaging, interesting, mostly jargon-free and less than four minutes long… So, if you’re an advocate of LOD without really understanding the processes behind the philosophy, get yourself a cuppa, settle down to watch this, and be informed.

Thanks Europeana, keep up the good work!

Europeana’s press release provides more information about the above dataset release and video.

Posted in Data, LOD-LAM, News | Tagged , | 1 Comment

JSON-LD / BibJSON

There have been requests on our mailing list recently to consider the various options for supporting validation of BibJSON and for supporting namespacing. These two options require some further consideration.

Validation

Efforts so far around BibJSON have focussed on building a useful JSON representation of bibliographic metadata, with some typical key/value pairs that are common in or extended from bibtex. This started off simply, but we have seen increasing complexity to accommodate further functionality requests. There was some work on a JSON schema for validation against, but given the aim of being as flexible as possible, and with very few required keys, the function of validation of a BibJSON document would have very little effect.

Validating a document as properly formatted JSON is, of course, a good idea; but there are plenty ways to do this already – just try to parse it with any number of libraries for your programming language of choice.

But to reach the stage of actually supporting validation against a pre-defined schema, we must pre-define a schema – and that means becoming inflexible (or doing such little validation as for it to be essentially pointless).

An alternative to validation against a schema would be adoption of namespaces.

Namespaces

We do already have a namespace concept in BibJSON – it is just a key in the metadata, under which can be listed namespaces and a suitable prefix for them. However, this model is not widely known (because we made it up). To overcome this, we should adopt the JSON-LD method of using @context parameters. This way, it would be possible to specify the namespace in which your record keys are defined, and to share namespace information with other people / machines.

What is the point

Using namespaces, having schema, only become sensible when there is a concerted effort to share data with others. For internal use, they could be valuable for consistency, but the code we write internally adheres by definition to our own level of consistency anyway.

Therefore, it is not a function of BibJSON to perform validation – BibJSON is just JSON. Rather, it is the function of a community to make agreements and to conform to those agreements as required.

Where such a function must be supported, it should be done via mechanisms already available and maintained for that purpose – there is no point attempting to maintain our own; it is not our key strength or goal.

Recommendation

Change the BibJSON use of namespaces to conform to the method specified in JSON-LD, and that wherever consistency is required, agreement to share data via JSON and within a particular @context should be reached.

The fundamental basic keys in BibJSON – the default context – should remain as they are, and should not require contextualisation.

If contextualisation of the fundamental keys of BibJSON is required, then those keys should be contextualised into a schema by whomsoever has such a requirement.

Ramifications

  • drop the “namespace” key in BibJSON
  • continue using BibJSON as normal, but:
  • reference JSON-LD for use of @context and other more complex LD functions as required
  • wherever validation is required, perform it based on the use of namespaced keys (beyond scope of bibjson)

References

Posted in BibServer, JISC OpenBib, OKFN Openbiblio | Tagged , , , , , , , | 1 Comment

BibSoup beta: released

BibSoup is here! And it’s going to revolutionise how you work with bibliographic metadata.

bibsoup_screenshot

Peter has been blogging for a while about BibSoup (see here for the basics and here for how to use it) and we’ve mentioned it in passing on this blog (for example this sprint post and explanation of Bib- terms)… But now it is time for the ‘official’ launch. Hurrah!

So, how to get involved?

Setting up a Bibserver and Faceted Browsing (Mark MacGillivray) from Bibsoup Project on Vimeo.

We already have parsers available to get your data directly via either BibTex or RIS (or from BibJSON…), which means you can get data in from most major bibliographic tools already; you can even use the parsers programmatically if you like, at http://bibsoup.net/parse (although that functionality is in the process of improvement). We are open to suggestions for further parsers, and would be happy to guide anyone through making one.

(By the way, we are assuming you will have seen previous posts on this site and will therefore know what we’re talking about, but if not then please see this OKFN blog post for a fuller explanation of what BibSoup is for, why it’s great and what this overall project is all about).

So, what do you think? Let us know. There will be bugs, or areas we could improve, so please pass suggestions our way. Feature requests can be submitted via our issue tracker, and we batch those up into milestones to work towards the next release. Our current focus is on improving parser functionality and also on enabling editing.

We hope you like it and find all this useful… do add your collections so we can share them with the rest of the world, too. If you would like your own BibServer, go ahead and download the code, or contact us for help / support options.

Posted in BibServer, Data, JISC OpenBib, News | Tagged , , , , , | 2 Comments

Communication processes – for the record!

This follows discussion that began at the meeting on 1st February, and reasserts existing processes.

Any proposal for discussion is published ahead of the meeting at which it is to be raised, with an email inviting everyone on the openbiblio-dev list to the call and linking to the etherpad (which contains or links to further details). This is to ensure materials are available in advance of calls.

Everyone is free to suggest direction, which is agreed by consensus. Technical lead is Mark, and Community lead is Naomi.

All discussion should be carried out openly, on the available mailing lists. Agreement before publication of blog posts or pages is not required – any early discussions off-list should be posted on the appropriate mailing list for discussion, then posted on the blog if they come to fruition. Strategic and management proposals should make it clear that they are for discussion until the team has covered the topic at the weekly catch-up (Wednesdays at 16.00 GMT).

Posted in JISC OpenBib, minutes | Tagged , | Leave a comment

Comparing existing bib tools

Update to this post: turns out there was a page, just not one I was aware of – please see http://en.wikipedia.org/wiki/Comparison_of_reference_management_software. I have linked to this from http://wiki.okfn.org/Projects/jiscopenbib2. Isn’t it handy when people have already done the job for us…

Recently, a discussion on the Working Group List raised the subject of existing technologies that store and share reading / publication lists, and how BibSoup / BibServer compares to them.

Tom Morris said:

Perhaps it would be illustrative to compare and contrast with other existing widely known services and tools such as Zotero, Mendeley, CiteUlike, and the venerable emacs/Bibtex/LaTex. What is better, worse, or just different? Which sets of things are alternatives to each other and which complement each other? What are the things which make BibSoup/BibServer unique?

Of course, if this is already laid out in detail somewhere on a web page, just point me there.

There wasn’t, until now, so please refer to this wiki page http://wiki.okfn.org/Projects/jiscopenbib2/managementtools set up for this purpose and start comparing!

I have used Thad Guidry’s notes on Mendeley, as well as the first line of the Wikipedia entry, to populate that example. Please do edit and add to this page – we want to avoid a debate on which is better than which, so please keep your opinions in check, but hopefully this will be a good opportunity to get a sense of what is in use and how they compare with one another and BibSoup.

Posted in BibServer, Data, JISC OpenBib, OKFN Openbiblio | Tagged , , , | Leave a comment