Open Bibliography and Open Bibliographic Data » jiscopenbib2 http://openbiblio.net Open Bibliographic Data Working Group of the Open Knowledge Foundation Tue, 08 May 2018 15:46:25 +0000 en-US hourly 1 http://wordpress.org/?v=4.3.1 Final report: JISC Open Bibliography 2 http://openbiblio.net/2012/08/23/final-report-jisc-open-bibliography-2/ http://openbiblio.net/2012/08/23/final-report-jisc-open-bibliography-2/#comments Thu, 23 Aug 2012 12:35:16 +0000 http://openbiblio.net/?p=2884 Continue reading ]]> Following on from the success of the first JISC Open Bibliography project we have now completed a further year of development and advocacy as part of the JISC Discovery programme.

Our stated aims at the beginning of the second year of development were to show our community (namely all those interested in furthering the cause of Open via bibliographic data, including: coders; academics; those with interest in supporting Galleries, Libraries, Archives and Museums; etc) what we are missing if we do not commit to Open Bibliography, and to show that Open Bibliography is a fundamental requirement of a community committed to discovery and dissemination of ideas. We intended to do this by demonstrating the value of carefully managed metadata collections of particular interest to individuals and small groups, thus realising the potential of the open access to large collections of metadata we now enjoy.

We have been successful overall in achieving our aims, and we present here a summary of our output to date (it may be useful to refer to this guide to terms).

Outputs

BibServer and FacetView

The BibServer open source software package enables individuals and small groups to present their bibliographic collections easily online. BibServer utilises elasticsearch in the background to index supplied records, and these are presented via the frontend using the FacetView javascript library. This use of javascript at the front end allows easy embedding of result displays on any web page.

BibSoup and more demonstrations

Our own version of BibServer is up and running at http://bibsoup.net, where we have seen over 100 users sharing more than 14000 records across over 60 collections. Some particularly interesting example collections include:

Additionally, we have created some niche instances of BibServer for solving specific problems – for example, check out http://malaria.bibsoup.net; here we have used BibServer to analyse and display collections specific to malaria researchers, as a demonstration of the extent of open access materials in the field. Further analysis allowed us to show where best to look for relevant materials that could be expected to be openly available, and to begin work on the concept of an Open Access Index for research.

Another example is the German National Bibliography, as provided by the German National Library, which is in progress (as explained by Adrian Pohl and Etienne Posthumus here). We have and are building similar collections for all other national bibliographies that we receive.

BibJSON

At http://bibjson.org we have produced a simple convention for presenting bibliographic records in JSON. This has seen good uptake so far, with additional use in the JISC TEXTUS project and in Total Impact, amongst others.

Pubcrawler

Pubcrawler collects bibliographic metadata, via parsers created for particular sites, and we have used it to create collections of articles. The full post provides more information.

datahub collections

We have continued to collect useful bibliographic collections throughout the year, and these along with all others discovered by the community can be found on the datahub in the bibliographic group.

Open Access / Bibliography advocacy videos and presentations

As part of a Sprint in January we recorded videos of the work we were doing and the roles we play in this project and wider biblio promotion; we also made a how-to for using BibServer, including feedback from a new user:

Setting up a Bibserver and Faceted Browsing (Mark MacGillivray) from Bibsoup Project on Vimeo.

Peter and Tom Murray-Rust’s video, made into a prezi, has proven useful in explaining the basics of the need for Open Bibliography and Open Access:

Community activities

The Open Biblio community have gathered for a number of different reasons over the duration of this project: the project team met in Cambridge and Edinburgh to plan work in Sprints; Edinburgh also played host to a couple of Meet-ups for the wider open community, as did London; and London hosted BiblioHack – a hackathon / workshop for established enthusasiasts as well as new faces, both with and without technical know-how.

These events – particularly BiblioHack – attracted people from all over the UK and Europe, and we were pleased that the work we are doing is gaining attention from similar projects world-wide.

Further collaborations

Lessons

Over the course of this project we have learnt that open source development provides great flexibility and power to do what we need to do, and open access in general frees us from many difficult constraints. There is now a lot of useful information available online for how to do open source and open access.
Whilst licensing remains an issue, it becomes clear that making everything publicly and freely available to the fullest extent possible is the simplest solution, causing no further complications down the line. See the open definition as well as our principles for more information.

We discovered during the BibJSON spec development that it must be clear whether a specification is centrally controlled, or more of a communal agreement on use. There are advantages and disadvantages to each method, however they are not compatible – although one may become the other. We took the communal agreement approach, as we found that in the early stages there was more value in exposing the spec to people as widely and openly as possible than in maintaining close control. Moving to a close control format requires specific and ongoing commitment.

Community building remains tricky and somewhat serendipitous. Just as word-of-mouth can enhance reputation, failure of certain communities can detrimentally impact other parts of the project. Again, the best solution is to ensure everything is as open as possible from the outset, thereby reducing the impact of any one particular failure.

Opportunities and Possibilities

Over the two years, the concept of open bibliography has gone from requiring justification to being an expectation; the value of making this metadata openly available to the public is now obvious, and getting such access is no longer so difficult; where access is not yet available, many groups are now moving toward making it available. And of course, there are now plenty tools to make good use of available metadata.

Future opportunities now lie in the more general field of Open Scholarship, where a default of Open Bibliography can be leveraged to great effect. For example, recent Open Access mandates by many UK funding councils (eg Finch Report) could be backed up by investigative checks on the accessibility of research outputs, supporting provision of an open access corpus of scholarly material.

We intend now to continue work in this wider context, and we will soon publicise our more specific ideas; we would appreciate contact with other groups interested in working further in this area.

Further information

For the original project overview, see http://openbiblio.net/p/jiscopenbib2; also, a full chronological listing of all our project posts is available at http://openbiblio.net/tag/jiscopenbib2/. The work package descriptions are available at http://openbiblio.net/p/jiscopenbib2/work-packages/, and links to posts relevant to each work package over the course of the project follow:

  • WP1 Participation with Discovery programme
  • WP2 Collaborate with partners to develop social and technical interoperability
  • WP3 Open Bibliography advocacy
  • WP4 Community support
  • WP5 Data acquisition
  • WP6 Software development
  • WP7 Beta deployment
  • WP8 Disruptive innovation
  • WP9 Project management (NB all posts about the project are relevant to this WP)
  • WP10 Preparation for service delivery

All software developed during this project is available on open source licence. All the data that was released during this project fell under OKD compliant licenses such as PDDL or CC0, depending on that chosen by the publisher. The content of our site is licensed under a Creative Commons Attribution 3.0 License (all jurisdictions).

The project team would like to thank supporting staff at the Open Knowledge Foundation and Cambridge University Library, the OKF Open Bibliography working group and Open Access working group, Neil Wilson and the team at the British Library, and Andy McGregor and the rest of the team at JISC.

]]>
http://openbiblio.net/2012/08/23/final-report-jisc-open-bibliography-2/feed/ 3
Community Discussions 3 http://openbiblio.net/2012/07/13/community-discussions-3/ http://openbiblio.net/2012/07/13/community-discussions-3/#comments Fri, 13 Jul 2012 12:41:46 +0000 http://openbiblio.net/?p=2842 Continue reading ]]> It has been a couple of months since the round-up on Community Discussions 2 and we have been busy! BiblioHack was a highlight for me, and last week included a meeting of many OKFN types – here’s a picture taken by Lucy Chambers for @OKFN of some team members:

IMG_0351

The Discussion List has been busy too:

  • Further to David Weinbergers’s pointer that Harvard released 12 million bibliographic records with a CC0 licence, Rufus Pollock created a collection on the DataHub and added it to the Biblio section for easy of reference

  • Rufus also noticed that OCLC had issued their major release of VIAF, meaning that millions of author records are now available as Open Data (under Open Data Commons Attribution license), and updated the DataHub dataset to reflect this

  • Peter Murray-Rust noted that Nature has made its metadata Open CC0

  • David Shotton promoted the International Workshop on Contributorship and Scholarly Attribution at Harvard, and prepared a handy guide for attribution of submissions

  • Adrian Pohl circulated a call for participation for the SWIB12 “Semantic Web in Bibliotheken” (Semantic Web in Libraries) Conference in Cologne, 26-28 November this year, and hosted the monthly Working Group call

  • Lars Aronsson looked at multivolume works, asking whether the OpenLibrary can create and connect records for each volume. HathiTrust and Gallica were suggested as potential tools in collating volumes, and the barcode (containing information populated by the source library) was noted as being invaluable in processing these

  • Sam Leon explained that TEXTUS would be integrating BibSever facet view and encouraged people to have a look at the work so far; Tom Oinn highlighted the collaboration between Enriched BibJSON and TEXTUS, and explained that he would be adding a ‘TEXTUS’ field to BibJSON for this purpose

  • Sam also circulated two tools for people to test, Pundit and Korbo, which have been developed out of Digitised Manuscripts to Europeana (DM2E)

  • Jenny Molloy promoted the Open Science Hackday which took place last week – see below for a snap-shot courtesy of @OKFN:

IMG_1964

In related news, Peter Murray-Rust is continuing to advocate the cause of open data – do have a read of the latest posts on his blog to see how he’s getting on.

The Open Biblio community continues to be invaluable to the Open GLAM, Heritage, Access and other groups too and I would encourage those interested in such discussions to join up at the OKFN Lists page.

]]>
http://openbiblio.net/2012/07/13/community-discussions-3/feed/ 0
Using wikipedia to build a philosophy (or other sort of) collection in BibSoup http://openbiblio.net/2012/06/27/using-wikipedia-to-build-a-philosophy-or-other-sort-of-collection-in-bibsoup/ http://openbiblio.net/2012/06/27/using-wikipedia-to-build-a-philosophy-or-other-sort-of-collection-in-bibsoup/#comments Wed, 27 Jun 2012 08:48:12 +0000 http://openbiblio.net/?p=2796 Continue reading ]]> Here is a quick example of how to quickly build a reference collection in BibSoup, using the great source of knowledge that is Wikipedia.

To begin with, you might want to go to Wikipedia directly and try performing some searches for relevant material, to help you put together sensible search terms for your area of interest. Your search terms will be used to pull relevant citations from the wikipedia database.

Then, go over to the BibSoup upload page; signup / login is required, so do that if you have not already done so.

Type in your wikipedia search terms in the upload box at the top of the page, give your collection a name and a description, specify the license if you wish, and choose the “wikipedia search to citations” file format from the list at the bottom. Then hit upload.

A ticket will be created for building your collection, and you can view the progress on the ticket page.

Once it is done, you can find your new collection either on the BibSoup collections page or on your own BibSoup user account page – for example atfor the user named “test”. Also of course, you could go straight to the URL of your collection – they appear at http://bibsoup.net/username/collection.

There you go! You should now have a reference collection based on your wikipedia search terms. Check out our our example.

]]>
http://openbiblio.net/2012/06/27/using-wikipedia-to-build-a-philosophy-or-other-sort-of-collection-in-bibsoup/feed/ 1
Bringing the Open German National Bibliography to a BibServer http://openbiblio.net/2012/06/18/bringing-the-open-german-national-bibliography-to-a-bibserver/ http://openbiblio.net/2012/06/18/bringing-the-open-german-national-bibliography-to-a-bibserver/#comments Mon, 18 Jun 2012 08:55:39 +0000 http://openbiblio.net/?p=2765 Continue reading ]]> This blog post is written by Etienne Posthumus and Adrian Pohl.

We are happy that the German National Library recently released the German National Bibliography as Linked Open Data, see (announcement). At the #bibliohack this week we worked on getting the data into a BibServer instance. Here, we want to share our experiences in trying to re-use this dataset.

Parsing large turtle files: problem and solution

The raw data file is 1.1GB in a compressed format – unzipped it is a 6.8 GB turtle file.
Working with this file is unwieldy, it can not be read into memory or converted with tools like rapper (which only works for turtle files up to 2 GB, see this mail thread). Thus, it would be nice if the German National Library could either provide one big N-Triples file that is better for streaming processing or provide a number of smaller turtle files.

Our solution to get the file into a workable form is to make a small Python script that is Turtle syntax aware, to split the file into smaller pieces. You can’t use the standard UNIX split command, as each snippet of the split file also needs the prefix information at the top and we do not want to split an entry in the middle, losing triples.

See a sample converted N-Triples file from a turtle snippet.

Converting the N-Triples to BibJSON

After this, we started working on parsing an example N-Triples file to convert the data to BibJSON. We haven’t gotten that far, though. See https://gist.github.com/2928984#file_ntriple2bibjson.py for the resulting code (work in progress).

Problems

We noted problems with some properties that we like to document here as feedback for the German National Library.

Heterogeneous use of dcterms:extent

The dcterms:extent property is used in many different ways, thus we are considering to omit it in the conversion to BibJSON. Some example values of this property: “Mikrofiches”, “21 cm”, “CD-ROMs”, “Videokassetten”, “XVII, 330 S.”. Probably it would be the more appropriate choice to use dcterms:format for most of these and to limit the use of dcterms:extent to pagination information and duration.

URIs that don’t resolve

We stumbled over some URIs that don’t resolve, whether you order RDF or HTML in the accept header. Examples: http://d-nb.info/019673442, http://d-nb.info/019675585, http://d-nb.info/011077166

Also, DDC URIs that are connected to a resource with dcters:subject don’t resolve, e.g. http://d-nb.info/ddc-sg/070.

Footnote

At a previous BibServer hackday, we loaded the Britsh National Bibliography data into BibServer. This was a similar problem, but as the data was in RDF/XML we could directly use the built-in Python XML streaming parser to convert the RDF data into BibJSON.
See: https://gist.github.com/1731588 for the source.

]]>
http://openbiblio.net/2012/06/18/bringing-the-open-german-national-bibliography-to-a-bibserver/feed/ 4
BiblioHack: Day 2, part 2 http://openbiblio.net/2012/06/14/bibliohack-day-2-part-2/ http://openbiblio.net/2012/06/14/bibliohack-day-2-part-2/#comments Thu, 14 Jun 2012 15:00:10 +0000 http://openbiblio.net/?p=2755 Continue reading ]]> Pens down! Or, rather, key-strokes cease!

BiblioHack has drawn to a close and the results of two days’ hard labour are in:

A Bibliographic Toolkit

Utilising BibServer

Peter Murray-Rust reported back on what was planned, what was done, and the overlap between the two! The priority was cleaning up the process for setting up BibServers and getting them running on different architectures. (PubCrawler was going to be run on BibServer but currently it’s not working). Yesterday’s big news was that Nature has released 30 million references or thereabouts – this furthers the cause of scholarly literature whereby we, in principle, can index records rather than just corporate organisations being able / permitted to do so. National Bibliographies have been put on BibSoup – UK (‘BL’), Germany, Spain and Sweden – with the technical problem character encodings raising its head (UTF8 solves this where used). Also, BibSoup is useful for TEXTUS so the overall ‘toolkit’ approach is reinforced!

Open Access Index

Emanuil Tolev presented on ACat – Academic Catalogue. The first part of an index is having things to access – so gathering about 55,000 journals was a good start! Using Elastic Search within these journals will give list of contents which will then provide lists of articles (via facet view), then other services will determine licensing / open access information (URL checks assisted in this process). The ongoing plan is to use this tool to ascertain licensing information for every single record in the world. (Link to ACat to follow).

Annotation Tools

Tom Oinn talked about the ideas that have come out of discussions and hacking around annotators and TEXTUS. Reading lists and citation management is a key part of what TEXTUS is intended to assist with, so the plan is for any annotation to be allowed to carry a citation – whether personal opinion or related record. Personalised lists will come out of this and TEXTUS should become a reference management tool in its own right. Keep your eye on TEXTUS for the practical applications of these ideas!

Note: more detailed write-ups will appear courtesy of others, do watch the OKFN blog for this and all things open…

Postscript: OKFN blog post here

Huge thanks to all those who participated in the event – your ideas and enthusiasm have made this so much fun to be involved with.

Also thanks to those who helped run the event, visible or behind-the-scenes, particularly Sam Leon.

Here’s to the next one :-)

]]>
http://openbiblio.net/2012/06/14/bibliohack-day-2-part-2/feed/ 0
BiblioHack: Day 2, part 1 http://openbiblio.net/2012/06/14/day-2-part-1/ http://openbiblio.net/2012/06/14/day-2-part-1/#comments Thu, 14 Jun 2012 10:46:36 +0000 http://openbiblio.net/?p=2748 Continue reading ]]> After easing into the day with breakfast and coffee, each of the 3 sub-groups gave an overview of the mini-project’s aim and fed back on the evening’s progress:

  • Peter Murray-Rust revisited the overarching theme of ‘A Bibliographic Toolkit’ and the BibServer sub-group’s specific work on adding datasets and easily deploying BibServer; Adrian Pohl followed up to explain that he would be developing a National Libraries BibServer.
  • Tom Oinn explained the Annotation Tools sub-groups’s work on developing annotation tools – ie TEXTUS – looking at adding fragments of text, with your own comments and metadata linked to it, which then forms BibSoup collections. Collating personalised references is enhanced with existing search functionality, and reading lists with annotations can refer to other texts within TEXTUS.
  • Mark MacGillivray presented the 3rd group’s work on an Open Access Index. This began with listing all the journals that can be found in the whole world, with the aim of identifying the licence of each article. They have been scraping collections (eg PubMed) and gathering journals – at the time of speaking they had around 50,000+! The aim is to enable a crowd-sourced list of every journal in the world which, using PubCrawler, should provide every single article in the world.

With just 5 hours left before stopping to gather thoughts, write-up and feedback to the rest of the group, it will be very interesting to see the result…

]]>
http://openbiblio.net/2012/06/14/day-2-part-1/feed/ 0
BiblioHack: Day 1 http://openbiblio.net/2012/06/14/bibliohack-day-1/ http://openbiblio.net/2012/06/14/bibliohack-day-1/#comments Thu, 14 Jun 2012 10:25:46 +0000 http://openbiblio.net/?p=2742 Continue reading ]]> The first day of BiblioHack was a day of combinations and sub-divisions!

The event attendees started the day all together, both hackers and workshop / seminar attendees, and Sam introduced the purpose of the day as follows: coders – to build tools and share ideas about things that will make our shared cultural heritage and knowledge commons more accessible and useful; non-coders – to get a crash course in what openness means for galleries, libraries, archives and museums, why it’s important and how you can begin opening up your data; everyone – to get a better idea about what other people working in your domain do and engender a better understanding between librarians, academics, curators, artists and technologists, in order to foster the creation of better, cooler tools that respond to the needs of our communities.

The hackers began the day with an overview of what a hackathon is for and how it can be run, as presented by Mahendra Mahey, and followed with lightning talks as follows:

  • Talk 1 Peter Murray Rust & Ross Mounce – Content and Data Mining and a PDF extractor
  • Talk 2 Mike Jones – the m-biblio project
  • Talk 4 Ian Stuart – ORI/RJB (formerly OA-RJ)
  • Talk 5 Etienne Posthumus – Making a BibServer Parser
  • Talk 6 Emanuil Tolev – IDFind – identifying identifiers (“Feedback and real user needs won’t gather themselves”)
  • Talk 7 Mark MacGillivray – BibServer – what the project has been doing recently, how that ties into the open access index idea.
  • Talk 8 Tom Oinn – TEXTUS
  • Talk 9 Simone Fonda – Pundit – collaborative semantic annotations of texts (Semantic Web-related tool)
  • Talk 10 Ian Stuart – The basics of Linked Data

We decided we wanted to work as a community, using our different skills towards one overarching goal, rather than breaking into smaller groups with separate agendas. We formed the central idea of an ‘open bibliographic tool-kit’ and people identified three main areas to hack around, playing to their skills and interests:

  • Utilising BibServer – adding datasets and using PubCrawler
  • Creating an Open Access Index
  • Developing annotation tools

At this point we all broke for lunch, and the workshoppers and hackers mingled together. As hoped, conversations sprung up between people from the two different groups and it was great to see suggestions arising from shared ideas and applications of one group being explained to the theories of the other.

We re-grouped and the workshop continued until 16.00 – see here for Tim Hodson’s excellent write-up of the event and talks given – when the hackers were joined by some who attended the workshop. Each group gave a quick update on status, to try to persuade the new additions to the group to join their particular work-flow, and each group grew in number. After more hushed discussions and typing, the day finished with a talk from Tara Taubman about her background in the legalities of online security and IP, and we went for dinner. Hacking continued afterwards and we celebrated a hard day’s work down the pub, lookong forward to what was to come.

Day 2 to follow…

]]>
http://openbiblio.net/2012/06/14/bibliohack-day-1/feed/ 0
BiblioHack Meet-up http://openbiblio.net/2012/06/13/bibliohack-meet-up/ http://openbiblio.net/2012/06/13/bibliohack-meet-up/#comments Wed, 13 Jun 2012 18:05:26 +0000 http://openbiblio.net/?p=2739 Continue reading ]]> I’ve been quiet on this blog lately, but it’s in the same way a duck looks still when swimming: things may look peaceful but there is much activity going on beneath the surface! The Open Biblio crowd have been busy on the discussion List (link to follow) and the BiblioHack organisers have been preparing for this week’s events, which kicked off with a Meet-up last night.

The pre-BiblioHack Meet-up was designed to be an informal opportunity for those involved in the events to put names to faces and start up discussions; it was also open to anyone who wanted to come along to find out more about open data and the OKFN’s Working Groups including Open GLAM, and projects such as DM2E as well as Open Biblio.

With no formal agenda, we started up conversations as the mood took us – this covered legalities of openness in relation to IP, licensing and open access, annotation, cat-sitting and the Blues. In a nod to the more ‘usual’ OKFN #OpenData meet-ups, we went around the room to introduce ourselves (trying to explain our interests in only 3 words was challenging…) which prompted some people to cross the room in a purposeful fashion to intercept someone they hadn’t spoken to by that point. I really enjoyed meeting the people with whom I’d be spending the next two days, so thanks to all those who came along, for their interesting ideas and suggestions, and huge thanks to Sam Leon for arranging the tasty food and drinks at C4CC and for facilitating the evening.

]]>
http://openbiblio.net/2012/06/13/bibliohack-meet-up/feed/ 1
Pubcrawler: finding research publications http://openbiblio.net/2012/06/13/pubcrawler-finding-research-publications/ http://openbiblio.net/2012/06/13/pubcrawler-finding-research-publications/#comments Wed, 13 Jun 2012 10:45:48 +0000 http://openbiblio.net/?p=2726 Continue reading ]]> This is a guest post from Sam Adams. (We have been using Pubcrawler in the Open Biblio 2 project to create reference collections of journal articles, and hope to continue this work further; this is a brief introduction to the software. Code is currently available in http://bitbucket.org/sea36/pubcrawler)

Pubcrawler collects bibliographic metadata (author, title, reference, DOI) by indexing journals’ websites in a similar manner to the way in which search engines explore the web to build their indexes. Where possible (which depends on the particular publication) it identifies any supplementary resources associated with a paper, and whether the paper is open access (i.e. readable without a subscription or any other charge) – though it cannot determine the license / conditions of such access.

Pubcrawler was originally developed by Nick Day as part of the CrystalEye project to aggregate published crystallographic structures from the supplementary data to articles on journals’ websites. Since then Pubcrawler has been extended to collect bibliographic metadata and support a wider range of journals than just those containing crystallography. Some of the activities Pubcrawler can currently support are:

  • Providing core bibliographic metadata
  • Identifying collections of open access articles
  • Identifying freely accessible supplementary information, which is often a rich source of scientific data

When pointed at a publisher’s homepage Pubcrawler will generate a list of the journals on the site and then crawl the issues’ tables of contents, recording the bibliographic metadata for the articles that it discovers. Pubcrawler uses a combination of two approaches to crawling a journal: starting at the current issue it can follow links to previous issues, walking the journal’s publication history, and if a journal’s website contains a list of issues it will also use that as a source of pages to crawl. When necessary, such as to identify supplementary resources, Pubcrawler can follow links to individual articles’ splash pages.

Pubcrawler does not index any content that is restricted by a journal’s paywall – it has been designed not to follow such links, and as added protection it is run over a commercial broadband connection, rather than from inside a University network to ensure that it does not receive any kind of privileged access.

While Pubcrawler’s general workflow is the same for any publication, custom parsers are required to extract the metadata and correct links from each website. Generally publishers use common templates for their journals web pages, so a parser only needs to be developed once per publishers, however in some instances, such as where older issues have not been updated to match the current template, a parser may need to support a variety of styles.

Pubcrawler currently has parsers (in varying states of completeness) for a number of publishers (biased by its history of indexing published Crystallographic structures):

  • The American Chemical Society (ACS)
  • Elsevier
  • The International Union of Crystallography (IUCr)
  • Nature
  • The Royal Society of Chemistry (RSC)
  • Springer
  • Wiley

And to date it has indexed over 10 million bibliographic records.

There are many other publishers who could be supported by Pubcrawler, they just require parsers to be created for them. Pubcrawler requires two types of maintainance – the general support to keep it running, administer servers etc, that any software requires, and occasional updates to the parsers as journal’s websites change their formatting.

]]>
http://openbiblio.net/2012/06/13/pubcrawler-finding-research-publications/feed/ 2
Open source development – how we are doing http://openbiblio.net/2012/05/29/open-source-development-how-we-are-doing/ http://openbiblio.net/2012/05/29/open-source-development-how-we-are-doing/#comments Tue, 29 May 2012 11:24:17 +0000 http://openbiblio.net/?p=2671 Continue reading ]]> Whilst at Open Source Junction earlier this year, I talked to Sander van der Waal and Rowan Wilson about the problems of doing open source development. Sander and Rowan work at OSS watch, and their aim is to make sure that open source software development delivers its potential to UK HEI and research; so, I thought it would be good to get their feedback on how our project is doing, and if there is anything we are getting wrong or could improve on.

It struck me that as other JISC projects such as ours are required to make their output similarly publicly available, this discussion may be of benefit to others; after all, not everyone knows what open source software is, let alone the complexities that can arise from trying to create such software. Whilst we cannot help avoid all such complexities, we can at least detail what we have found helpful to date, and how OSS Watch view our efforts.

I provided Sander and Rowan a review of our project, and Rowan provided some feedback confirming that overall we are doing a good job, although we lack a listing of the other open source software our project relies on, and their licenses. Whilst such data can be discerned from the dependencies of the project, this is not clear enough; I will add a written list of dependencies to the README.

The response we received is provided below, followed by the overview I initially provided, which gives a brief overview of how we managed our open source development efforts:

==== Rowan Wilson, OSS Watch, responds:

Your work on this project is extremely impressive. You have the systems in place that we recommend for open development and creation of community around software, and you are using them. As an outsider I am able to quickly see that your project is active and the mailing list and roadmap present information about ways in which I could participate.

One thing I could not find, although this may be my fault, is a list of third party software within the distribution. This may well be because there is none, but it’s something I would generally be keen to see for the purposes of auditing licence compatibility.

Overall though I commend you on how tangible and visible the development work on this project is, and on the focus on user-base expansion that is evident on the mailing list.

==== Mark MacGillivray wrote:

Background – May 2011, OKF / AIM bibserver project

Open Knowledge Foundation contracted with American Institute of
Mathematics under the direction of Jim Pitman in the dept. of Maths
and Stats at UC Berkeley. The purpose of the project was to create an
open source software repository named BibServer, and to develop a
software tool that could be deployed by anyone requiring an easy way
to put and share bibliographic records online.

A repository was created at http://github.com/okfn/bibserver, and it
performs the usual logging of commits and other activities expected of
a modern DVCS system. This work was completed in September 2011, and the repository has been available since the start of that project with a GNU Affero GPL v3 licence attached.

October 2011 – JISC Open Biblio 2 project

The JISC Open BIblio 2 project chose to build on the open source
software tool named BibServer. As there was no support from AIM for
maintaining the BibServer repository, the project took on maintenance
of the repository and all further development work, with no change to
previous licence conditions.

We made this choice as we perceive open source licensing as a benefit
rather than a threat; it fit very well with the requirements of JISC
and with the desires of the developers involved in the project. At
worst, an owner may change the licence attached to some software, but
even in such a situation we could continue our work by forking from
the last available open source version (presuming that licence
conditions cannot be altered retrospectively).

The code continues to display the licence under which it is available,
and remains publicly downloadable at http://github.com/okfn/bibserver.
Should this hosting resource become publicly unavailable, an
alternative public host would be sought.

Development work and discussion has been managed publicly, via a
combination of the project website at
http://openbiblio.net/p/jiscopenbib2, the issue tracker at
http://github.com/okfn/bibserver/issues, a project wiki at
http://wiki.okfn.org/Projects/openbibliography, and via a mailing list
at openbiblio-dev@lists.okfn.org

February 2012 – JISC Open Biblio 2 offers bibsoup.net beta service

In February the JISC Open Biblio 2 project announced a beta service
available online for free public use at http://bibsoup.net. The
website runs an instance of BibServer, and highlights that the code is
open source and available (linking to the repository) to anyone who
wishes to use it.

Current status

We believe that we have made sensible decisions in choosing open
source software for our project, and have made all efforts to promote
the fact that the code is freely and publicly available.

We have found the open source development paradigm to be highly
beneficial – it has enabled us to publicly share all the work we have
done on the project, increasing engagement with potential users and
also with collaborators; we have also been able to take advantage of
other open source software during the project, incorporating it into
our work to enable faster development and improved outcomes.

We continue to develop code for the benefit of people wishing to
publicly put and share their bibliographies online, and all our
outputs will continue to be publicly available beyond the end of the
current project.

]]>
http://openbiblio.net/2012/05/29/open-source-development-how-we-are-doing/feed/ 1