Open Bibliography and Open Bibliographic Data » projectPlan http://openbiblio.net Open Bibliographic Data Working Group of the Open Knowledge Foundation Tue, 08 May 2018 15:46:25 +0000 en-US hourly 1 http://wordpress.org/?v=4.3.1 Open source development – how we are doing http://openbiblio.net/2012/05/29/open-source-development-how-we-are-doing/ http://openbiblio.net/2012/05/29/open-source-development-how-we-are-doing/#comments Tue, 29 May 2012 11:24:17 +0000 http://openbiblio.net/?p=2671 Continue reading ]]> Whilst at Open Source Junction earlier this year, I talked to Sander van der Waal and Rowan Wilson about the problems of doing open source development. Sander and Rowan work at OSS watch, and their aim is to make sure that open source software development delivers its potential to UK HEI and research; so, I thought it would be good to get their feedback on how our project is doing, and if there is anything we are getting wrong or could improve on.

It struck me that as other JISC projects such as ours are required to make their output similarly publicly available, this discussion may be of benefit to others; after all, not everyone knows what open source software is, let alone the complexities that can arise from trying to create such software. Whilst we cannot help avoid all such complexities, we can at least detail what we have found helpful to date, and how OSS Watch view our efforts.

I provided Sander and Rowan a review of our project, and Rowan provided some feedback confirming that overall we are doing a good job, although we lack a listing of the other open source software our project relies on, and their licenses. Whilst such data can be discerned from the dependencies of the project, this is not clear enough; I will add a written list of dependencies to the README.

The response we received is provided below, followed by the overview I initially provided, which gives a brief overview of how we managed our open source development efforts:

==== Rowan Wilson, OSS Watch, responds:

Your work on this project is extremely impressive. You have the systems in place that we recommend for open development and creation of community around software, and you are using them. As an outsider I am able to quickly see that your project is active and the mailing list and roadmap present information about ways in which I could participate.

One thing I could not find, although this may be my fault, is a list of third party software within the distribution. This may well be because there is none, but it’s something I would generally be keen to see for the purposes of auditing licence compatibility.

Overall though I commend you on how tangible and visible the development work on this project is, and on the focus on user-base expansion that is evident on the mailing list.

==== Mark MacGillivray wrote:

Background – May 2011, OKF / AIM bibserver project

Open Knowledge Foundation contracted with American Institute of
Mathematics under the direction of Jim Pitman in the dept. of Maths
and Stats at UC Berkeley. The purpose of the project was to create an
open source software repository named BibServer, and to develop a
software tool that could be deployed by anyone requiring an easy way
to put and share bibliographic records online.

A repository was created at http://github.com/okfn/bibserver, and it
performs the usual logging of commits and other activities expected of
a modern DVCS system. This work was completed in September 2011, and the repository has been available since the start of that project with a GNU Affero GPL v3 licence attached.

October 2011 – JISC Open Biblio 2 project

The JISC Open BIblio 2 project chose to build on the open source
software tool named BibServer. As there was no support from AIM for
maintaining the BibServer repository, the project took on maintenance
of the repository and all further development work, with no change to
previous licence conditions.

We made this choice as we perceive open source licensing as a benefit
rather than a threat; it fit very well with the requirements of JISC
and with the desires of the developers involved in the project. At
worst, an owner may change the licence attached to some software, but
even in such a situation we could continue our work by forking from
the last available open source version (presuming that licence
conditions cannot be altered retrospectively).

The code continues to display the licence under which it is available,
and remains publicly downloadable at http://github.com/okfn/bibserver.
Should this hosting resource become publicly unavailable, an
alternative public host would be sought.

Development work and discussion has been managed publicly, via a
combination of the project website at
http://openbiblio.net/p/jiscopenbib2, the issue tracker at
http://github.com/okfn/bibserver/issues, a project wiki at
http://wiki.okfn.org/Projects/openbibliography, and via a mailing list
at openbiblio-dev@lists.okfn.org

February 2012 – JISC Open Biblio 2 offers bibsoup.net beta service

In February the JISC Open Biblio 2 project announced a beta service
available online for free public use at http://bibsoup.net. The
website runs an instance of BibServer, and highlights that the code is
open source and available (linking to the repository) to anyone who
wishes to use it.

Current status

We believe that we have made sensible decisions in choosing open
source software for our project, and have made all efforts to promote
the fact that the code is freely and publicly available.

We have found the open source development paradigm to be highly
beneficial – it has enabled us to publicly share all the work we have
done on the project, increasing engagement with potential users and
also with collaborators; we have also been able to take advantage of
other open source software during the project, incorporating it into
our work to enable faster development and improved outcomes.

We continue to develop code for the benefit of people wishing to
publicly put and share their bibliographies online, and all our
outputs will continue to be publicly available beyond the end of the
current project.

]]>
http://openbiblio.net/2012/05/29/open-source-development-how-we-are-doing/feed/ 1
JSON-LD / BibJSON http://openbiblio.net/2012/02/21/json-ld-bibjson/ http://openbiblio.net/2012/02/21/json-ld-bibjson/#comments Tue, 21 Feb 2012 18:00:02 +0000 http://openbiblio.net/?p=2325 Continue reading ]]> There have been requests on our mailing list recently to consider the various options for supporting validation of BibJSON and for supporting namespacing. These two options require some further consideration.

Validation

Efforts so far around BibJSON have focussed on building a useful JSON representation of bibliographic metadata, with some typical key/value pairs that are common in or extended from bibtex. This started off simply, but we have seen increasing complexity to accommodate further functionality requests. There was some work on a JSON schema for validation against, but given the aim of being as flexible as possible, and with very few required keys, the function of validation of a BibJSON document would have very little effect.

Validating a document as properly formatted JSON is, of course, a good idea; but there are plenty ways to do this already – just try to parse it with any number of libraries for your programming language of choice.

But to reach the stage of actually supporting validation against a pre-defined schema, we must pre-define a schema – and that means becoming inflexible (or doing such little validation as for it to be essentially pointless).

An alternative to validation against a schema would be adoption of namespaces.

Namespaces

We do already have a namespace concept in BibJSON – it is just a key in the metadata, under which can be listed namespaces and a suitable prefix for them. However, this model is not widely known (because we made it up). To overcome this, we should adopt the JSON-LD method of using @context parameters. This way, it would be possible to specify the namespace in which your record keys are defined, and to share namespace information with other people / machines.

What is the point

Using namespaces, having schema, only become sensible when there is a concerted effort to share data with others. For internal use, they could be valuable for consistency, but the code we write internally adheres by definition to our own level of consistency anyway.

Therefore, it is not a function of BibJSON to perform validation – BibJSON is just JSON. Rather, it is the function of a community to make agreements and to conform to those agreements as required.

Where such a function must be supported, it should be done via mechanisms already available and maintained for that purpose – there is no point attempting to maintain our own; it is not our key strength or goal.

Recommendation

Change the BibJSON use of namespaces to conform to the method specified in JSON-LD, and that wherever consistency is required, agreement to share data via JSON and within a particular @context should be reached.

The fundamental basic keys in BibJSON – the default context – should remain as they are, and should not require contextualisation.

If contextualisation of the fundamental keys of BibJSON is required, then those keys should be contextualised into a schema by whomsoever has such a requirement.

Ramifications

  • drop the “namespace” key in BibJSON
  • continue using BibJSON as normal, but:
  • reference JSON-LD for use of @context and other more complex LD functions as required
  • wherever validation is required, perform it based on the use of namespaced keys (beyond scope of bibjson)

References

]]>
http://openbiblio.net/2012/02/21/json-ld-bibjson/feed/ 1
JISC OpenBibliography: Projected Timeline, Workplan & Overall Project Methodology http://openbiblio.net/2010/08/31/jisc-openbibliography-projected-timeline-workplan-overall-project-methodology/ http://openbiblio.net/2010/08/31/jisc-openbibliography-projected-timeline-workplan-overall-project-methodology/#comments Tue, 31 Aug 2010 15:40:40 +0000 http://openbiblio.net/?p=140 Continue reading ]]> The JISC OpenBiblio project is scheduled to run from 1st July 2010 to 31st March 2011. During that time, the project will run 2 week iterative development cycles, each including (for links to trac, code repository, wiki etc see project resources page):

  • Weekly meetings
  • Technical lead reports on development since last meeting; incomplete functionality is moved into the next development cycle or abandoned (if at the end of a cycle)
  • Advocacy lead reports on development since last meeting; team should be updated about recent advocacy successes and about events soon to take place.
  • Team discuss and decide technical developments for next development cycle
  • A project blog post should be written each time a success occurs, and referenced to a deliverable listed under the work packages in the JISC project bid.
  • A project blog post should be written describing obstacles causing delay to any functionality aims or advocacy successes.
  • Team members should identify topics raised via the mailing lists that need further consideration. This is to manage how mailing list discussions become documented parts of the project; anything from the mailing list that becomes significant to the project should be documented in a blog post / comment / trac task as appropriate.
  • Team members keep notes in the meeting minutes document.
  • Technical lead (or others) updates trac as necessary to keep note of technical aims, successes and failures. The trac should be viewed as a resource for the technical lead to report to the team, and as a source of information for writing up the project report, but not as a project management tool
  • OKF project wiki and JISC expo spreadsheet should be updated as required to reflect any changes in location of key documents

Rather than aiming to develop a specific product, the aim is to develop as much useful output as possible before the project deadline, ideally meeting or exceeding the deliverables defined under the work packages in the JISC project bid.

This attitude is suitable due to the nature of the project – a significant amount of advocacy is required to convince data publishers of the benefits of open access to bibliographic information, and this work in itself should not be overlooked. Therefore, achievements in attaining open agreements will lead to further development opportunities. Overall success can be measured against three strands:

  1. Publicity / advocacy successes – e.g. a good response at a conference to a discussion of the project goals.
  2. Agreements to provide open data – when data providers actually commit to allowing access to their datasets; this is a specific achievement over and above those in point 1.
  3. Technical developments – with access to open data sets, develop examples of how they can be put to valuable use for the community; this should feed back into point 1, leading to more of point 2, and so on.
]]>
http://openbiblio.net/2010/08/31/jisc-openbibliography-projected-timeline-workplan-overall-project-methodology/feed/ 0
JISC OpenBibliography: Project Team Relationships and End User Engagement. http://openbiblio.net/2010/08/31/jisc-openbibliography-project-team-relationships-and-end-user-engagement/ http://openbiblio.net/2010/08/31/jisc-openbibliography-project-team-relationships-and-end-user-engagement/#comments Tue, 31 Aug 2010 15:18:45 +0000 http://openbiblio.net/?p=118 Continue reading ]]> The JISC OpenBibliography project team:

  • Peter Murray-Rust of the University of Cambridge Unilever Centre, Departmentt of Chemistry is a reader in Molecular Informatics with nearly 200 publications. Peter will focus on the project direction and the co-ordination of the project partners, and will contribute major software enhancements to his JUMBO library for converting CIF to RDF.
  • Dr. Rufus Pollock will contribute to most areas of work, focusing on the project direction, management and dissemination. He will also help develop the metadata design, store architecture and disambiguation work. He is a co-founder and board member of the Open Knowledge Foundation with extensive experience of the legal, social and technical aspect of open information and bibliographic data in particular. He has also worked extensively with bibliographic metadata including the full Cambridge University Library catalogue and on developing databases, processing of bibliographic formats (including MARC), and matching of entities from different datasets.
  • Ben O’Steen has 13 years IT development experience and has most recently worked at the Oxford University Library Service as the software architect for the Bodleian Library’s DAMS (Digital Asset Management System). Extensive experience working with RDF, bibliographic and related metadata standards and distributed system design. He was part of the winning team of Repository Challenge 08 with an entry that provided a RDF Linked Data view on two of the leading opensource repository systems. O’Steen will contribute to most areas including metadata design and realisation, triple storage and SPARQL endpoints and user-facing interfaces and query systems.
]]>
http://openbiblio.net/2010/08/31/jisc-openbibliography-project-team-relationships-and-end-user-engagement/feed/ 0
JISC OpenBibliography: IPR statement http://openbiblio.net/2010/07/15/jisc-openbibliography-ipr-statement/ http://openbiblio.net/2010/07/15/jisc-openbibliography-ipr-statement/#comments Thu, 15 Jul 2010 10:24:38 +0000 http://openbiblio.net/?p=115 Continue reading ]]> All sourced data will fall under a license compatible with the criteria laid out at http://www.opendefinition.org/ – which will ensure that replication and reuse of the data created and hosted by this project is both fully reusable by the community that JISC seeks to support and the wider community still.

Project documentation will be published under a CC-BY attribution license, project data created by the team will be published under the PDDL and the source code created for the project will be published under the BSD license.

Organisational terms:

The OKF uses Open Definition compliant license for its content and data. For example, for content, CC-By, and for data the Open Data Commons Public Domain Definition and License (PDDL) or the Open Data Commons Data(base) Attribution License.

The University of Cambridge asserts its rights to IP created by employees in the course of their employment.

All software is distributed under the Artistic Licence (BSD style).

The IUCr uses CC-BY for its Open Access material and will use the services of the OKF to advise on the best ways of Opening data and services.

]]>
http://openbiblio.net/2010/07/15/jisc-openbibliography-ipr-statement/feed/ 0
JISC OpenBibliography: Risk Analysis and Success Plan http://openbiblio.net/2010/07/15/jisc-openbibliography-risk-analysis-and-success-plan/ http://openbiblio.net/2010/07/15/jisc-openbibliography-risk-analysis-and-success-plan/#comments Thu, 15 Jul 2010 10:10:10 +0000 http://openbiblio.net/?p=113 Continue reading ]]> Key Risk:

Collections are unavailable or intractable:

This was quoted as one of the key risks in the project plan. However, from initial conversations with publishers and other sources, the likelyhood of the project having too little data to work is rapidly diminishing.

Success Plan:

Success: The initial search, query and other compute-intensive services become over-subscribed from real demand.

Managed by: The service is hosted on Amazon EC2 and is designed to be scalable. If there is money left in the budget, the service could be transferred to a more heavy duty VM. Otherwise, part of the design is that anyone can setup and run the service as all the tools and data are open, so we could recommend to heavy users that they run a mirror instance locally to themselves.

Success: Bibliographic metadata from this project is begun to be used in production library management systems.

Managed By: Whilst we cannot affect the cataloguing processes by which the records are entered into a given institution’s system, we maintain URLs and provenance for all the records we provide. This enables those systems which reuse the data to be able to track and show the provenance for a given record, if they maintain a link to the source. We would also recommend that institutions or organisations that reuse the data to state openly that they do so, thereby increasing the profile of the project and of JISC, its funder.

Risk Assesment:

Risk Probability
(1-5)
Severity
(1-5)
Score
(P x S)
Action to Prevent/Manage Risk
Staffing
Staff retention 3 5 15 Ensure staff are satisfied and challenged and have chance to give feedback by means of regular one-to-ones. Apply open management to ensure sharing of expertise thus enabling cover.
Key academic staff leave 1 2 2 There is sufficient in-depth coverage from expertise available in the university; recruit replacement
Technical
Technical problems 1 5 5 Similar problems already solved; well-known experts on team
Difficulty in integrating tools in services and workflows 1 4 4 Use iterative development so as deliver at least a partial solution as opposed to nothing at all
OKF service not supplied 2 4 8 Move to other available platforms such as Talis connect commons,4store, Sesame
Hardware Failure resulting in loss of data 2 4 8 Use standard approaches to data and service backup, including automated backup and off-site replication
External suppliers
Collections are unavailable or intractable 2 5 10 For catalaogues use other Open offerings (several are available, many are members of the OKF’s working group on bibliographic information).
Open Citations is not funded 1 1 1 Work with other citation experts
LEGAL
Data protection infringement 1 5 5 Close consultation with University legal services such as UMIP, establish clear project staff guidelines w.r.t. commercial partners
]]>
http://openbiblio.net/2010/07/15/jisc-openbibliography-risk-analysis-and-success-plan/feed/ 0
JISC OpenBibliography: Wider Benefits to Sector & Achievements for Host Institution http://openbiblio.net/2010/07/15/jisc-openbibliography-wider-benefits-to-sector-achievements-for-host-institution/ http://openbiblio.net/2010/07/15/jisc-openbibliography-wider-benefits-to-sector-achievements-for-host-institution/#comments Thu, 15 Jul 2010 09:15:48 +0000 http://openbiblio.net/?p=111 Continue reading ]]>
  • Bibliographic data is useful; A number of organisations such as CERN and Library of Congress have recognised that providing open access to bibliographic records and controlled vocabularies is a natural and necessary step to begin to identify errors and to avoid erroneous or divergent duplication, thereby improving the metadata accuracy. A key point from Karen Coyle is “The change that libraries will need to make in response [to user demand] must include the transformation of the library’s public catalog from a stand-alone database of bibliographic records to a highly hyperlinked data set that can interact with information resources on the World Wide Web.”
  • Bibliographic data is, in general, not open or linked: this limits its usefulness to the academic community. This project will deliver bibliographic material that is truly open (as in http://opendefinition.org where the team has particular expertise). Many attempts to create LOD suffer because there are no useful resources to link to. OpenBibliography will expose Author names, Institutions and Geographical Locations with semantic targets in the LOD ecosystem (e.g. Geonames, Wikipedia); the project will put significant effort into disambiguation so that OpenBibliography can become an important node in the LOD graph.
  • Processes to make it open or linked are not familiar to libraries and publishers: Much modern bibliographic data is created implicitly or explicitly by the scholarly publication process but exposed poorly or not at all. Working with cooperating publishers can rapidly transform their output to complete open semantic bibliography. By providing a clear working model for bibliographic metadata as semantic, referenceable links with a reusable workflow to gather, add provenance, refine and disambiguate existing metadata information, members of the JISC community can apply the same model and techiniques with the open-source code and services we will provide to use data from and contribute to the aforementioned ‘highly hyperlinked data set’.
  • ]]>
    http://openbiblio.net/2010/07/15/jisc-openbibliography-wider-benefits-to-sector-achievements-for-host-institution/feed/ 0
    JISC OpenBiblography: Aims, Objectives and Final Outputs http://openbiblio.net/2010/07/15/jisc-openbiblography-aims-objectives-and-final-outputs/ http://openbiblio.net/2010/07/15/jisc-openbiblography-aims-objectives-and-final-outputs/#comments Thu, 15 Jul 2010 09:07:01 +0000 http://openbiblio.net/?p=108 Continue reading ]]> This project will publish a substantial corpus of bibliographic metadata as Linked Open Data, using existing semantic web tools, standards (RDF, SPARQL), linked data patterns and accepted Open ontologies (FoaF, Bibo, DC, etc).

    The data will be from two distinct sources: traditional library catalogues (Cambridge University Library and the British Library) and ToCs from a scientific publisher, the IUCr. None of the material is currently available as LOD, furthermore the outputs can be guaranteed to be open (unlike many existing data efforts, linked or otherwise).

    Key strategies are

    • transformation of current publishers’ model to create Open Bibliography as part of their future business, and
    • the immediate and continuing engagement of the scholarly community.

    Deliverables include a maintained and growing bibliography on the IUCr site and engagements with other like-minded publishers such as PLoS as well as the code for the  software used to create the Linked Data versions of the aforementioned sources.

    ]]>
    http://openbiblio.net/2010/07/15/jisc-openbiblography-aims-objectives-and-final-outputs/feed/ 1