Open Bibliography and Open Bibliographic Data » BibServer

A revamp of bibserver and bibsoup

Mark MacGillivray — Fri, 11 Jan 2013 03:26:12 +0000

Since our work last year on the JISC Open Bibliography 2 project, I have been thinking about the approach we took to building a tool that people might use; some of that approach, I think, was wrong. So, I have recently been working on some changes and pushed a new version of bibserver to the repository, in a branch called bibwiki.

Also today, the service running at http://bibsoup.net has been rebooted to run the new branch. One of the downsides of this is that user accounts and data that existed on the old system are no longer available; because there were some issues with the old system anyway, it was giving errors for a few of the more recent attempts to upload large datasets, so I decided to wipe the slate clean and start again from scratch. However, if you had any particular collections in there that you need to have recovered, please get in touch via the openbiblio-dev mailing list and I will recover them for you.

Now, on to the details of what has changed, and why. Let’s start with the why.

Why change it?

One of the original requirements of bibserver was that it would present a personally curated collection of bibliographic records; this extended not only to the curation of the collection, but to the curation of records within that collection. Unfortunately, this made every collection an island – a private island, with guards round the edges; not so good for building open knowledge or community. Also, we put too much emphasis on legacy data and formats; whilst there is of course value in old standards like bibtex, and in historical records, giving up the flexibility of the present for the sake of the past is the opposite of progress. Instead we should take the best bits of what we had and improve on them, then get our historical content into newer, more useful forms.

Because of these issues, it seems sensible therefore to try a more connected, more open, more modern approach. So, what I have done is to remove the concept of “ownership” of a record and to remove the ties to legacy data formats or sources. Instead what we now have is a tool into which we can dump bibJSON data, and via which we can build personally curated collections of shared bibliographic records.

So what has changed?

you can only upload bibJSON

Whilst the conversion tools we wrote to process data from formats such as bibtex or RIS into bibJSON are useful and will be utilised elsewhere, they are not part of the core functionality of bibserver. They are a way to get from the past into the present, and once you are here, you should forget about the past and get on with the future. So your upload is one-off, and cares not from whence it came.

You can edit records, but so can anybody else

Does what it says on the tin. For now, editing is only via clunky edit of the JSON itself, but this can have a nice UI added later.

You can tag any record with anything, but so can anybody else

Anyone can tag a record with a useful term; anyone can remove a tag.

You can still build your own collection

You can still create your own collection and curate it as you see fit, and other people will not be able to change what records are in that collection; but the records themselves are still editable by anyone. Seems scary? Well, yes. But get used to it. It works for wikipedia. (Which is why I called the new branch bibwiki.)

You can’t visualise facets anymore

You used to be able to make a little bubble picture out of the facet filters down the left hand side. Now you can’t. It was a bit incongruously located, so this functionality is being hived off into a more specifically useful form.

You can search for any record and add it to your collection

Anything that is on the bibserver instance can be found by anyone using the search box, then you can add it to one of your collections. However, searching for everything has limited functionality and does not offer filters. This is because one of the constraints of scaling up to large datasets is that filtering is expensive; so now, you have simple search across everything, then nice complex filtered search on the things you care about. Best of both worlds with minimal compromise.

Simplistic record deduping

Where a record appears to have the same title-and-authors string on import as another record already in the database, it will try to squish them together. The important point here though is that the functionality exists now in to deduplicate things via various methods, and there is no longer a constraint to maintain unique copies of things, so we can get on and build those methods.

Exciting. So, what next?

Rework the parsers into a stand-alone service

The parsers from bibtex, RIS, etc should be built out as a simple service that we can run where you hit the webpage, give it your file (or file URL), and it pings you when it has done the conversion with a link for you to get your bibJSON from. This should work with parser plugins sort-of functionality, so that we can run it with the parsers we have, and other people can run it with their own if they wish. Then we can boot up a translation service at http://translate.bibsoup.net.

This is the most important next step, as without it not many people will be able to upload records.

Upload some bibliographic metadata

There are numerous sources of biblio metadata we have collected over the years, and some of these will be uploaded into bibsoup for people to use.

Also, there is potential to run specific instances of bibsoup for people who need them – although, overall, it is probably more sensible to keep them all together and distinguish via collections.

Bugfix

This is basically a beta 2 implementation. Please go and use the new system at http://bibsoup.net, and get back to the mailing list with the usual issues.

Build up some deduplication maybe with pybossa

Now that we can edit records and find similar ones, we can also do interesting things like enable users to tag records that are about the same thing. We can also run queries to find similar records and expose that data perhaps through a tool like pybossa, to get crowd-sourced deduplication on the go.

Rewrite the tests

All the tests that were in the original branch have yet to be copied over. A lot of them will become redundant. So if you like tests (and we should have them), then get involved with porting them over / writing new ones

Update the docs

The documentation needs to be updated, a lot of it still refers to the old branch. Although, a fair bit of it is still relevant.

Decide how to manage the code and bibsoup in the future

What I have done here are some fairly large changes to our original aims; it is possible that not everybody will like this. However, the great thing about code repositories is that we have versioning, so anyone can use any version of the software. My changes are still in a branch, so we can either merge these into the main, or fork them off to a separate project if necessary. Unless there are reasons against merging into main are given, that will be the course taken once the parsers have been hived off.

Minutes: 25th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

Naomi Lillie — Wed, 05 Sep 2012 17:21:28 +0000

Date: September, 4th 2012, 15:00 GMT

Channels: Meeting was held via Skype and Etherpad

Participants

Peter Murray-Rust
Naomi Lillie

NB Karen Coyle apologies due to attendance at DublinCore conference

Agenda

As there was just PeterMR and me attending this call, we abandoned any formal agenda and had a very pleasant chat discussing PeterMR’s engagements and the upcoming OKFestival.

PeterMR has been presenting various Bibliographic tools (including BibSoup) at a number of events lately, including VIVO12, and will do so at the upcoming Digital Science 2012 in Oxford. We discussed support for the existing tools we have in the Open Knowledge Foundation, in terms of person-resource and funding, and the importance of BiBServer as an underlying tool for much of the work to be done in and around Open Bibliography and Access.

OKFest is less than 2 weeks away now and there is so much potential here for collaboration and idea generation… We agreed we are very excited and looking forward to meeting the pillars of Open society as well as those brand-new to this world which will only grow in influence and importance. Now is the time to embrace Open!

There were no particular actions, but it was helpful to consider how we can make a difference on the world of bibliography, for OKFN and GLAM institutions in general (ie galleries, libraries, archives and museums).

To join the Open Bibliography community sign up here – you may also be interested in the Open Access Working Group which is closely aligned in its outlook and aims.

Final report: JISC Open Bibliography 2

Mark MacGillivray — Thu, 23 Aug 2012 12:35:16 +0000

Following on from the success of the first JISC Open Bibliography project we have now completed a further year of development and advocacy as part of the JISC Discovery programme.

Our stated aims at the beginning of the second year of development were to show our community (namely all those interested in furthering the cause of Open via bibliographic data, including: coders; academics; those with interest in supporting Galleries, Libraries, Archives and Museums; etc) what we are missing if we do not commit to Open Bibliography, and to show that Open Bibliography is a fundamental requirement of a community committed to discovery and dissemination of ideas. We intended to do this by demonstrating the value of carefully managed metadata collections of particular interest to individuals and small groups, thus realising the potential of the open access to large collections of metadata we now enjoy.

We have been successful overall in achieving our aims, and we present here a summary of our output to date (it may be useful to refer to this guide to terms).

Outputs

BibServer and FacetView

The BibServer open source software package enables individuals and small groups to present their bibliographic collections easily online. BibServer utilises elasticsearch in the background to index supplied records, and these are presented via the frontend using the FacetView javascript library. This use of javascript at the front end allows easy embedding of result displays on any web page.

BibSoup and more demonstrations

Our own version of BibServer is up and running at http://bibsoup.net, where we have seen over 100 users sharing more than 14000 records across over 60 collections. Some particularly interesting example collections include:

Cambridge Physics Tripos – a collection of 234 records extracted from a physics department MS Word reading list
Adrians bibsonomy bibliographie – a collection of 338 records extracted directly from Bibsonomy
Testing philosophy – a collection of 21 records extracted directly from Wikipedia via the Wikipedia search collection builder

Additionally, we have created some niche instances of BibServer for solving specific problems – for example, check out http://malaria.bibsoup.net; here we have used BibServer to analyse and display collections specific to malaria researchers, as a demonstration of the extent of open access materials in the field. Further analysis allowed us to show where best to look for relevant materials that could be expected to be openly available, and to begin work on the concept of an Open Access Index for research.

Another example is the German National Bibliography, as provided by the German National Library, which is in progress (as explained by Adrian Pohl and Etienne Posthumus here). We have and are building similar collections for all other national bibliographies that we receive.

BibJSON

At http://bibjson.org we have produced a simple convention for presenting bibliographic records in JSON. This has seen good uptake so far, with additional use in the JISC TEXTUS project and in Total Impact, amongst others.

Pubcrawler

Pubcrawler collects bibliographic metadata, via parsers created for particular sites, and we have used it to create collections of articles. The full post provides more information.

datahub collections

We have continued to collect useful bibliographic collections throughout the year, and these along with all others discovered by the community can be found on the datahub in the bibliographic group.

Open Access / Bibliography advocacy videos and presentations

As part of a Sprint in January we recorded videos of the work we were doing and the roles we play in this project and wider biblio promotion; we also made a how-to for using BibServer, including feedback from a new user:

Setting up a Bibserver and Faceted Browsing (Mark MacGillivray) from Bibsoup Project on Vimeo.

Peter and Tom Murray-Rust’s video, made into a prezi, has proven useful in explaining the basics of the need for Open Bibliography and Open Access:

Community activities

The Open Biblio community have gathered for a number of different reasons over the duration of this project: the project team met in Cambridge and Edinburgh to plan work in Sprints; Edinburgh also played host to a couple of Meet-ups for the wider open community, as did London; and London hosted BiblioHack – a hackathon / workshop for established enthusasiasts as well as new faces, both with and without technical know-how.

These events – particularly BiblioHack – attracted people from all over the UK and Europe, and we were pleased that the work we are doing is gaining attention from similar projects world-wide.

Further collaborations

TEXTUS wants to integrate BibServer FacetView and add a ‘TEXTUS’ field to BibJSON – this is ongoing but work to-date is available as a prototype;
Public Domain Works produced the Open Metadata Handbook in collaboration with the Open Bibliographic Data Working Group;
Mike Jones used his mobile application, M-Biblio, to deposit metadata to BibServer during the Hackathon – he writes about the trials and successes here;
OSS Watch, the JISC-funded organisation, looked at our project output to monitor good open-source standards. We shared their results here;
ServiceCORE tells you if a record within BibServer is available from any UK repository – we link directly to the results from our record pages.

Lessons

Over the course of this project we have learnt that open source development provides great flexibility and power to do what we need to do, and open access in general frees us from many difficult constraints. There is now a lot of useful information available online for how to do open source and open access.
Whilst licensing remains an issue, it becomes clear that making everything publicly and freely available to the fullest extent possible is the simplest solution, causing no further complications down the line. See the open definition as well as our principles for more information.

We discovered during the BibJSON spec development that it must be clear whether a specification is centrally controlled, or more of a communal agreement on use. There are advantages and disadvantages to each method, however they are not compatible – although one may become the other. We took the communal agreement approach, as we found that in the early stages there was more value in exposing the spec to people as widely and openly as possible than in maintaining close control. Moving to a close control format requires specific and ongoing commitment.

Community building remains tricky and somewhat serendipitous. Just as word-of-mouth can enhance reputation, failure of certain communities can detrimentally impact other parts of the project. Again, the best solution is to ensure everything is as open as possible from the outset, thereby reducing the impact of any one particular failure.

Opportunities and Possibilities

Over the two years, the concept of open bibliography has gone from requiring justification to being an expectation; the value of making this metadata openly available to the public is now obvious, and getting such access is no longer so difficult; where access is not yet available, many groups are now moving toward making it available. And of course, there are now plenty tools to make good use of available metadata.

Future opportunities now lie in the more general field of Open Scholarship, where a default of Open Bibliography can be leveraged to great effect. For example, recent Open Access mandates by many UK funding councils (eg Finch Report) could be backed up by investigative checks on the accessibility of research outputs, supporting provision of an open access corpus of scholarly material.

We intend now to continue work in this wider context, and we will soon publicise our more specific ideas; we would appreciate contact with other groups interested in working further in this area.

Further information

For the original project overview, see http://openbiblio.net/p/jiscopenbib2; also, a full chronological listing of all our project posts is available at http://openbiblio.net/tag/jiscopenbib2/. The work package descriptions are available at http://openbiblio.net/p/jiscopenbib2/work-packages/, and links to posts relevant to each work package over the course of the project follow:

WP1 Participation with Discovery programme
WP2 Collaborate with partners to develop social and technical interoperability
WP3 Open Bibliography advocacy
WP4 Community support
WP5 Data acquisition
WP6 Software development
WP7 Beta deployment
WP8 Disruptive innovation
WP9 Project management (NB all posts about the project are relevant to this WP)
WP10 Preparation for service delivery

All software developed during this project is available on open source licence. All the data that was released during this project fell under OKD compliant licenses such as PDDL or CC0, depending on that chosen by the publisher. The content of our site is licensed under a Creative Commons Attribution 3.0 License (all jurisdictions).

The project team would like to thank supporting staff at the Open Knowledge Foundation and Cambridge University Library, the OKF Open Bibliography working group and Open Access working group, Neil Wilson and the team at the British Library, and Andy McGregor and the rest of the team at JISC.

Importing Spanish National Library to BibServer

Etienne Posthumus — Tue, 07 Aug 2012 16:09:29 +0000

The Spanish National Library (Biblioteca Nacional de España or BNE) has released their library catalogue as Linked Open Data on the Datahub.

Initially this entry only containd the SPARQL endpoints and not downloads of the full datasets. After some enquiries from Naomi Lillie the entry was updated with links to the some more information and bulk downloads at: http://www.bne.es/es/Catalogos/DatosEnlazados/DescargaFicheros/

This library dataset is particularly interesting as it is not a ‘straightforward’ dump of bibliographic records. This is best explained by Karen Coyle in her blogpost.

For a BibServer import, the implications are that we have to distinguish the types of record that is read by the importing script and take the relevant action before building the BibJSON entry. Fortunately the datadump was made as N-Triples already, so we did not have to pre-process the large datafile (4.9GB) in the same manner as we did with the German National Library dataset.

The Python script to perform the reading of the datafile can be viewed at https://gist.github.com/3225004

A complicating matter from a data wrangler’s point of view is that the field names are based on IFLA Standards, which are numeric codes and not ‘guessable’ English terms like DublinCore fields for example. This is more correct from an international and data quality point of view, but does make the initial mapping more time consuming.

So when mapping a data item like https://gist.github.com/3225004#file_sample.nt we need to dereference each fieldname and map it to the relevant BibJSON entry.

As we identify more Linked Open Data National Bibliographies, these experiments will be continued under the http://nb.bibsoup.net/ BibServer instance.

Community Discussions 3

Naomi Lillie — Fri, 13 Jul 2012 12:41:46 +0000

It has been a couple of months since the round-up on Community Discussions 2 and we have been busy! BiblioHack was a highlight for me, and last week included a meeting of many OKFN types – here’s a picture taken by Lucy Chambers for @OKFN of some team members:

The Discussion List has been busy too:

Further to David Weinbergers’s pointer that Harvard released 12 million bibliographic records with a CC0 licence, Rufus Pollock created a collection on the DataHub and added it to the Biblio section for easy of reference
Rufus also noticed that OCLC had issued their major release of VIAF, meaning that millions of author records are now available as Open Data (under Open Data Commons Attribution license), and updated the DataHub dataset to reflect this
Peter Murray-Rust noted that Nature has made its metadata Open CC0
David Shotton promoted the International Workshop on Contributorship and Scholarly Attribution at Harvard, and prepared a handy guide for attribution of submissions
Adrian Pohl circulated a call for participation for the SWIB12 “Semantic Web in Bibliotheken” (Semantic Web in Libraries) Conference in Cologne, 26-28 November this year, and hosted the monthly Working Group call
Lars Aronsson looked at multivolume works, asking whether the OpenLibrary can create and connect records for each volume. HathiTrust and Gallica were suggested as potential tools in collating volumes, and the barcode (containing information populated by the source library) was noted as being invaluable in processing these
Sam Leon explained that TEXTUS would be integrating BibSever facet view and encouraged people to have a look at the work so far; Tom Oinn highlighted the collaboration between Enriched BibJSON and TEXTUS, and explained that he would be adding a ‘TEXTUS’ field to BibJSON for this purpose
Sam also circulated two tools for people to test, Pundit and Korbo, which have been developed out of Digitised Manuscripts to Europeana (DM2E)
Jenny Molloy promoted the Open Science Hackday which took place last week – see below for a snap-shot courtesy of @OKFN:

In related news, Peter Murray-Rust is continuing to advocate the cause of open data – do have a read of the latest posts on his blog to see how he’s getting on.

The Open Biblio community continues to be invaluable to the Open GLAM, Heritage, Access and other groups too and I would encourage those interested in such discussions to join up at the OKFN Lists page.

Using wikipedia to build a philosophy (or other sort of) collection in BibSoup

Mark MacGillivray — Wed, 27 Jun 2012 08:48:12 +0000

Here is a quick example of how to quickly build a reference collection in BibSoup, using the great source of knowledge that is Wikipedia.

To begin with, you might want to go to Wikipedia directly and try performing some searches for relevant material, to help you put together sensible search terms for your area of interest. Your search terms will be used to pull relevant citations from the wikipedia database.

Then, go over to the BibSoup upload page; signup / login is required, so do that if you have not already done so.

Type in your wikipedia search terms in the upload box at the top of the page, give your collection a name and a description, specify the license if you wish, and choose the “wikipedia search to citations” file format from the list at the bottom. Then hit upload.

A ticket will be created for building your collection, and you can view the progress on the ticket page.

Once it is done, you can find your new collection either on the BibSoup collections page or on your own BibSoup user account page – for example atfor the user named “test”. Also of course, you could go straight to the URL of your collection – they appear at http://bibsoup.net/username/collection.

There you go! You should now have a reference collection based on your wikipedia search terms. Check out our our example.

Bringing the Open German National Bibliography to a BibServer

Adrian Pohl — Mon, 18 Jun 2012 08:55:39 +0000

This blog post is written by Etienne Posthumus and Adrian Pohl.

We are happy that the German National Library recently released the German National Bibliography as Linked Open Data, see (announcement). At the #bibliohack this week we worked on getting the data into a BibServer instance. Here, we want to share our experiences in trying to re-use this dataset.

Parsing large turtle files: problem and solution

The raw data file is 1.1GB in a compressed format – unzipped it is a 6.8 GB turtle file.
Working with this file is unwieldy, it can not be read into memory or converted with tools like rapper (which only works for turtle files up to 2 GB, see this mail thread). Thus, it would be nice if the German National Library could either provide one big N-Triples file that is better for streaming processing or provide a number of smaller turtle files.

Our solution to get the file into a workable form is to make a small Python script that is Turtle syntax aware, to split the file into smaller pieces. You can’t use the standard UNIX split command, as each snippet of the split file also needs the prefix information at the top and we do not want to split an entry in the middle, losing triples.

See a sample converted N-Triples file from a turtle snippet.

Converting the N-Triples to BibJSON

After this, we started working on parsing an example N-Triples file to convert the data to BibJSON. We haven’t gotten that far, though. See https://gist.github.com/2928984#file_ntriple2bibjson.py for the resulting code (work in progress).

Problems

We noted problems with some properties that we like to document here as feedback for the German National Library.

Heterogeneous use of dcterms:extent

The dcterms:extent property is used in many different ways, thus we are considering to omit it in the conversion to BibJSON. Some example values of this property: “Mikrofiches”, “21 cm”, “CD-ROMs”, “Videokassetten”, “XVII, 330 S.”. Probably it would be the more appropriate choice to use dcterms:format for most of these and to limit the use of dcterms:extent to pagination information and duration.

URIs that don’t resolve

We stumbled over some URIs that don’t resolve, whether you order RDF or HTML in the accept header. Examples: http://d-nb.info/019673442, http://d-nb.info/019675585, http://d-nb.info/011077166

Also, DDC URIs that are connected to a resource with dcters:subject don’t resolve, e.g. http://d-nb.info/ddc-sg/070.

Footnote

At a previous BibServer hackday, we loaded the Britsh National Bibliography data into BibServer. This was a similar problem, but as the data was in RDF/XML we could directly use the built-in Python XML streaming parser to convert the RDF data into BibJSON.
See: https://gist.github.com/1731588 for the source.

BiblioHack: Day 2, part 2

Naomi Lillie — Thu, 14 Jun 2012 15:00:10 +0000

Pens down! Or, rather, key-strokes cease!

BiblioHack has drawn to a close and the results of two days’ hard labour are in:

A Bibliographic Toolkit

Utilising BibServer

Peter Murray-Rust reported back on what was planned, what was done, and the overlap between the two! The priority was cleaning up the process for setting up BibServers and getting them running on different architectures. (PubCrawler was going to be run on BibServer but currently it’s not working). Yesterday’s big news was that Nature has released 30 million references or thereabouts – this furthers the cause of scholarly literature whereby we, in principle, can index records rather than just corporate organisations being able / permitted to do so. National Bibliographies have been put on BibSoup – UK (‘BL’), Germany, Spain and Sweden – with the technical problem character encodings raising its head (UTF8 solves this where used). Also, BibSoup is useful for TEXTUS so the overall ‘toolkit’ approach is reinforced!

Open Access Index

Emanuil Tolev presented on ACat – Academic Catalogue. The first part of an index is having things to access – so gathering about 55,000 journals was a good start! Using Elastic Search within these journals will give list of contents which will then provide lists of articles (via facet view), then other services will determine licensing / open access information (URL checks assisted in this process). The ongoing plan is to use this tool to ascertain licensing information for every single record in the world. (Link to ACat to follow).

Annotation Tools

Tom Oinn talked about the ideas that have come out of discussions and hacking around annotators and TEXTUS. Reading lists and citation management is a key part of what TEXTUS is intended to assist with, so the plan is for any annotation to be allowed to carry a citation – whether personal opinion or related record. Personalised lists will come out of this and TEXTUS should become a reference management tool in its own right. Keep your eye on TEXTUS for the practical applications of these ideas!

Note: more detailed write-ups will appear courtesy of others, do watch the OKFN blog for this and all things open…

Postscript: OKFN blog post here

Huge thanks to all those who participated in the event – your ideas and enthusiasm have made this so much fun to be involved with.

Also thanks to those who helped run the event, visible or behind-the-scenes, particularly Sam Leon.

Here’s to the next one

BiblioHack: Day 2, part 1

Naomi Lillie — Thu, 14 Jun 2012 10:46:36 +0000

After easing into the day with breakfast and coffee, each of the 3 sub-groups gave an overview of the mini-project’s aim and fed back on the evening’s progress:

Peter Murray-Rust revisited the overarching theme of ‘A Bibliographic Toolkit’ and the BibServer sub-group’s specific work on adding datasets and easily deploying BibServer; Adrian Pohl followed up to explain that he would be developing a National Libraries BibServer.
Tom Oinn explained the Annotation Tools sub-groups’s work on developing annotation tools – ie TEXTUS – looking at adding fragments of text, with your own comments and metadata linked to it, which then forms BibSoup collections. Collating personalised references is enhanced with existing search functionality, and reading lists with annotations can refer to other texts within TEXTUS.
Mark MacGillivray presented the 3rd group’s work on an Open Access Index. This began with listing all the journals that can be found in the whole world, with the aim of identifying the licence of each article. They have been scraping collections (eg PubMed) and gathering journals – at the time of speaking they had around 50,000+! The aim is to enable a crowd-sourced list of every journal in the world which, using PubCrawler, should provide every single article in the world.

With just 5 hours left before stopping to gather thoughts, write-up and feedback to the rest of the group, it will be very interesting to see the result…

BiblioHack: Day 1

Naomi Lillie — Thu, 14 Jun 2012 10:25:46 +0000

The first day of BiblioHack was a day of combinations and sub-divisions!

The event attendees started the day all together, both hackers and workshop / seminar attendees, and Sam introduced the purpose of the day as follows: coders – to build tools and share ideas about things that will make our shared cultural heritage and knowledge commons more accessible and useful; non-coders – to get a crash course in what openness means for galleries, libraries, archives and museums, why it’s important and how you can begin opening up your data; everyone – to get a better idea about what other people working in your domain do and engender a better understanding between librarians, academics, curators, artists and technologists, in order to foster the creation of better, cooler tools that respond to the needs of our communities.

The hackers began the day with an overview of what a hackathon is for and how it can be run, as presented by Mahendra Mahey, and followed with lightning talks as follows:

Talk 1 Peter Murray Rust & Ross Mounce – Content and Data Mining and a PDF extractor
Talk 2 Mike Jones – the m-biblio project
Talk 4 Ian Stuart – ORI/RJB (formerly OA-RJ)
Talk 5 Etienne Posthumus – Making a BibServer Parser
Talk 6 Emanuil Tolev – IDFind – identifying identifiers (“Feedback and real user needs won’t gather themselves”)
Talk 7 Mark MacGillivray – BibServer – what the project has been doing recently, how that ties into the open access index idea.
Talk 8 Tom Oinn – TEXTUS
Talk 9 Simone Fonda – Pundit – collaborative semantic annotations of texts (Semantic Web-related tool)
Talk 10 Ian Stuart – The basics of Linked Data

We decided we wanted to work as a community, using our different skills towards one overarching goal, rather than breaking into smaller groups with separate agendas. We formed the central idea of an ‘open bibliographic tool-kit’ and people identified three main areas to hack around, playing to their skills and interests:

Utilising BibServer – adding datasets and using PubCrawler
Creating an Open Access Index
Developing annotation tools

At this point we all broke for lunch, and the workshoppers and hackers mingled together. As hoped, conversations sprung up between people from the two different groups and it was great to see suggestions arising from shared ideas and applications of one group being explained to the theories of the other.

We re-grouped and the workshop continued until 16.00 – see here for Tim Hodson’s excellent write-up of the event and talks given – when the hackers were joined by some who attended the workshop. Each group gave a quick update on status, to try to persuade the new additions to the group to join their particular work-flow, and each group grew in number. After more hushed discussions and typing, the day finished with a talk from Tara Taubman about her background in the legalities of online security and IP, and we went for dinner. Hacking continued afterwards and we celebrated a hard day’s work down the pub, lookong forward to what was to come.

Day 2 to follow…