Minutes: 28th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

Date: February, 5th 2013, 16:00 GMT

Channels: Meeting was held via Skype and Etherpad

Participants

  • Adrian Pohl
  • Karen Coyle
  • Tom Johnson
  • Tom Morris
  • On the Etherpad:
    • Peter Murray-Rust
    • Mark McGillivray

Agenda

  • As there were two new participants to the meeting (who already engaged in discussions on the mailing list though) attended the meeting everybody introduced themselves. The “new” participants were:
    • Tom Morris: “Tom Morris is the top external data contributor to Freebase and has contributed more than 1.6 million facts. He’s been a member of the Freebase community for several years. When not hacking on Freebase, Tom is an independent software engineering and product management consultant.” (taken from here, shortened and updated
    • Tom Johnson: “Thomas Johnson is Digital Applications Librarian at Oregon State University Libraries, where he works on digital curation, scholarly publication, and related metadata and software issues.

Bibframe and data licensing

  • Adrian started a discussion on the bibframe list, see here.
  • Karen: It isn’t clear to me how BIBFRAME will be documented, and whether that documentation will be sufficient to process data. Note that RDA (the cataloging rules) is not freely available, therefore if BIBFRAME does develop for RDA there may be conflicts relating to text such as term definitions.
    • This adresses licensing of bibframe spec, not the bibliographic data but may be a problem in the future if Bibframe re-uses content from the RDA spec.
  • Tom Morris: Licensing policy seems to be orthogonal to modelling process
  • Conclusion: We’ll wait as a working group and not push the LoC further towards open data.
  • Tom Morris: We should think about lobbying for making the process more open.
  • Tom Morris: German National Library and other early experimenters of bibframe should get up their code on github to bring the development forward

Bibliographic Extension for schema.org (schemabibex)

  • See minutes of last meeting for background information.
  • The work is moving forward to create more schema.org properties for bibliographic data — but so far not including journal articles
  • Library view point predominates at schemabibex group, scientists’ view point isn’t represented
  • Karen: Somebody from the scientific community should join schemabibex or start seperate effort. <– Maybe people from scholarlyhtml?

NISO Bibliographic Meeting

  • http://www.niso.org/topics/tl/BibliographicRoadmap/
  • NISO has a grant to hold a meeting of "interested parties" relating to bibliographic data.
  • Goes back to effort of Karen Coyle and another person to include other producers of bibliographic data than libraries (publishers, scientists etc.) in developments of future standards for bibliographic data (like Bibframe).
  • See also the thread on the openbiblio list. tfmorris: As much of the information as possible should be published online.
  • Meeting will be held in March or April in Washington D.C.
  • Interested parties can participate in the initial meeting but there's no/little funding. (See this email for the proposed dates of the meeting.
  • "We are planning to have a live-stream of the event, presuming there is sufficient bandwidth at the meeting site."

BiblioHackfests

  • Peter Murray-Rust wrote before the meeting: "I'd like to run a hackfest (in AU) later this month and make Bib an important aspect. Can we pull together a "hacking kit" for such an even (e.g. examples of BibJSON, some converters, a simple BibSoup, etc."
    • Mark McGillivray responded: "yes: I will write a blog post that explains bibsoup a bit more, and we could use a google spreadsheet for simple collection of records."

BibJSON

  • Tom Morris had two questions regarding BibJSON which and Mark provided some answers on the etherpad.
  • Q: What is being done to promoted adoption?
    • MM says: "_I and others continue to use bibjson and promote it on our projects. it is now being used by the open citations project and there will be updates to bibjson.org soon with further recommendations – mostly around how to specify provenance in a bibjson record. Also we have agreed with crossref for them to output bibjson – it needs some fixes to be correct, but is just about there.
  • Q: What tool support is available? (Mendeley, Zotero, converters, etc)
    • MArk says: "The translators are currently unavailable – they will soon be put up at a separate url for translating files to bibjson which can then be used in bibsoup. Mendeley, Zotero etc can all output bib collections in formats that we can already convert, so there is support in that sense. Separating out the translators will also make it easier for people to implement their own."
  • Tim morris: There's PR value in having BibJSON listed on the https://github.com/zotero/translators
  • Ways of promoting BibJSON:
    • Articles: Tom Johnson published an article on BibJSON application in code4lib journal: http://journal.code4lib.org/articles/7949
    • Talks: e.g. at code4lib (Tom Johnson will be there and might give a lightning talk mentioning BibJSON.),
    • Adoption: CrossRef would be a great addition. Need more services like Mendeley, Zotero, Open Library, BibSonomy etc. to support BibJSON (input/output)
  • Tom Johnson asks: What is the motivation to provide BibJSON output?

Open Library

  • Speaking about BibJSON adoption we camte to talking about what will happen to the Open Library. Karen gave a short summary of what are the future plans for Open Library:
    • Open Library currently has no assigned staff resources. Open Library is being integrated into the whole Internet Archive system and may cease using the current infogami platform. It isn't clear if the same UI will be available, nor if there will be any further development in terms of features such as APIs.
    • No batches of records (LC books records or Amazon records) have been loaded since mid-2012.
    • Tom Morris is primarily interested in the data and the process to reconcile it etc. but he also emphasizes the value of the brand and the community.
    • Karen: infogami is interesting as a flexible development platform that sits on a triple store: http://infogami.org/
    • Tom Johnson: What can we do regarding Open Library?
      • Karen: Set up a mirror?
      • Make records for free ebooks available as MARC so that libraries can integrate these into their catalogue. <– Tom Morris would help with that.

Public Domain Books/authors

Posted in minutes, OKFN Openbiblio | Leave a comment

A revamp of bibserver and bibsoup

Since our work last year on the JISC Open Bibliography 2 project, I have been thinking about the approach we took to building a tool that people might use; some of that approach, I think, was wrong. So, I have recently been working on some changes and pushed a new version of bibserver to the repository, in a branch called bibwiki.

Also today, the service running at http://bibsoup.net has been rebooted to run the new branch. One of the downsides of this is that user accounts and data that existed on the old system are no longer available; because there were some issues with the old system anyway, it was giving errors for a few of the more recent attempts to upload large datasets, so I decided to wipe the slate clean and start again from scratch. However, if you had any particular collections in there that you need to have recovered, please get in touch via the openbiblio-dev mailing list and I will recover them for you.

Now, on to the details of what has changed, and why. Let’s start with the why.

Why change it?

One of the original requirements of bibserver was that it would present a personally curated collection of bibliographic records; this extended not only to the curation of the collection, but to the curation of records within that collection. Unfortunately, this made every collection an island – a private island, with guards round the edges; not so good for building open knowledge or community. Also, we put too much emphasis on legacy data and formats; whilst there is of course value in old standards like bibtex, and in historical records, giving up the flexibility of the present for the sake of the past is the opposite of progress. Instead we should take the best bits of what we had and improve on them, then get our historical content into newer, more useful forms.

Because of these issues, it seems sensible therefore to try a more connected, more open, more modern approach. So, what I have done is to remove the concept of “ownership” of a record and to remove the ties to legacy data formats or sources. Instead what we now have is a tool into which we can dump bibJSON data, and via which we can build personally curated collections of shared bibliographic records.

So what has changed?

you can only upload bibJSON

Whilst the conversion tools we wrote to process data from formats such as bibtex or RIS into bibJSON are useful and will be utilised elsewhere, they are not part of the core functionality of bibserver. They are a way to get from the past into the present, and once you are here, you should forget about the past and get on with the future. So your upload is one-off, and cares not from whence it came.

You can edit records, but so can anybody else

Does what it says on the tin. For now, editing is only via clunky edit of the JSON itself, but this can have a nice UI added later.

You can tag any record with anything, but so can anybody else

Anyone can tag a record with a useful term; anyone can remove a tag.

You can still build your own collection

You can still create your own collection and curate it as you see fit, and other people will not be able to change what records are in that collection; but the records themselves are still editable by anyone. Seems scary? Well, yes. But get used to it. It works for wikipedia. (Which is why I called the new branch bibwiki.)

You can’t visualise facets anymore

You used to be able to make a little bubble picture out of the facet filters down the left hand side. Now you can’t. It was a bit incongruously located, so this functionality is being hived off into a more specifically useful form.

You can search for any record and add it to your collection

Anything that is on the bibserver instance can be found by anyone using the search box, then you can add it to one of your collections. However, searching for everything has limited functionality and does not offer filters. This is because one of the constraints of scaling up to large datasets is that filtering is expensive; so now, you have simple search across everything, then nice complex filtered search on the things you care about. Best of both worlds with minimal compromise.

Simplistic record deduping

Where a record appears to have the same title-and-authors string on import as another record already in the database, it will try to squish them together. The important point here though is that the functionality exists now in to deduplicate things via various methods, and there is no longer a constraint to maintain unique copies of things, so we can get on and build those methods.

Exciting. So, what next?

Rework the parsers into a stand-alone service

The parsers from bibtex, RIS, etc should be built out as a simple service that we can run where you hit the webpage, give it your file (or file URL), and it pings you when it has done the conversion with a link for you to get your bibJSON from. This should work with parser plugins sort-of functionality, so that we can run it with the parsers we have, and other people can run it with their own if they wish. Then we can boot up a translation service at http://translate.bibsoup.net.

This is the most important next step, as without it not many people will be able to upload records.

Upload some bibliographic metadata

There are numerous sources of biblio metadata we have collected over the years, and some of these will be uploaded into bibsoup for people to use.

Also, there is potential to run specific instances of bibsoup for people who need them – although, overall, it is probably more sensible to keep them all together and distinguish via collections.

Bugfix

This is basically a beta 2 implementation. Please go and use the new system at http://bibsoup.net, and get back to the mailing list with the usual issues.

Build up some deduplication maybe with pybossa

Now that we can edit records and find similar ones, we can also do interesting things like enable users to tag records that are about the same thing. We can also run queries to find similar records and expose that data perhaps through a tool like pybossa, to get crowd-sourced deduplication on the go.

Rewrite the tests

All the tests that were in the original branch have yet to be copied over. A lot of them will become redundant. So if you like tests (and we should have them), then get involved with porting them over / writing new ones

Update the docs

The documentation needs to be updated, a lot of it still refers to the old branch. Although, a fair bit of it is still relevant.

Decide how to manage the code and bibsoup in the future

What I have done here are some fairly large changes to our original aims; it is possible that not everybody will like this. However, the great thing about code repositories is that we have versioning, so anyone can use any version of the software. My changes are still in a branch, so we can either merge these into the main, or fork them off to a separate project if necessary. Unless there are reasons against merging into main are given, that will be the course taken once the parsers have been hived off.

Posted in BibServer | Tagged , , | Leave a comment

Minutes: 27th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

Date: January, 8th 2013, 16:00 GMT

Channels: Meeting was held via Skype and Etherpad

Participants

  • Adrian Pohl
  • Peter Murray-Rust
  • Richard Wallis

Agenda

Schemabib Extension group update

  • Links:
  • W3C community and business group, started by Richard Wallis (OCLC) in September 2012
  • Conference meeting once a month
  • Idea: Get consensus across the bibliographic community about how to extend schema.org.
  • Lightweight approach, should not compete with MARC
  • Most people interested in bibliodata come from the library community. Richard tried to extend the group to other people (publishers, scholars etc.).
  • Background: OCLC publishing Linked Data in worldcat.org using schema.org vocabulary. schema.org missed properties
  • In the end: Publish extension proposal to the public-vocabs list
  • Peter comments on schema.org: schema.org is going to work because its built by people who know how the web works
  • Currently discussion about the concept of work and instances; FRBR comes up but such a model wouldn’t make it into schema.org
  • Richard: It makes sense to publish schema.org alongside BibFrame or RDA.
  • Peter: Talking to Mark McGillivray might make sense to find out how schema.org bibdata can relate to BibJSON and the accompanying tools.

Bibframe draft data model

GOKb (Global Open Knowledgebase)

Adrian heard about this project but all he could find on the web about it was litte information:

“Kuali OLE, one of the largest academic library software collaborations in the United States, and JISC, the UK’s expert on digital technologies for education and research, announce a collaboration that will make data about e-resources—such as publication and licensing information—more easily available.

Together, Kuali OLE and JISC will develop an international open data repository that will give academic libraries a broader view of subscribed resources.
The effort, known as the Global Open Knowledgebase (GOKb) project, is funded in part by a $499,000 grant from The Andrew W. Mellon Foundation. North Carolina State University will serve as lead institution for the project.

GOKb will be an open, community-based, international data repository that will provide libraries with publication information about electronic resources. This information will support libraries in providing efficient and effective services to their users and ensure that critical electronic collections are available to their students and researchers.” from http://gokb.org/post/25021222983/gobkpressrelease

GOKb is … focused on global-level metadata about e-resources with the goal of supporting management of those e-resources across the resource lifecycle. GOKb does not aspire to replace current vendor-provided KB products. But it does aspire to make good data available to everybody, including existing KBs, and to provide an open and low-barrier way for libraries to access this data. Our goal is that GOKb data is permeates the KB ecosystem so that all library systems, whether ILS, ERM, KB or discovery, will have better quality data about electronic collections than they do today.” From http://kualiole.tumblr.com/post/32942331929/bib-data-is-now-more-open-what-about-knowledge-base

  • The oparticipants didn’t know much more about this initiative. Adrian will try to find out more for upcoming meetings.

Other

  • Peter briefly informed about some interesting developments: *Open citations: http://opencitations.wordpress.com/ (David Shotton, Oxford, Uk)
    • Hargreaves report: UK government says it’s legal toc mine content. See Peter’s post at [http://blogs.ch.cam.ac.uk/pmr/2012/12/21/opencontentmining-massive-step-forward-come-and-join-us-in-the-uk/](http://blogs.ch.cam.ac.uk/pmr/2012/12/21/opencontentmining-massive-step-forward-come-and-join-us-in-the-uk/]
    • Pubcrawler
    • Crossref biblio/citation data

Posted in minutes, OKFN Openbiblio | Leave a comment

Minutes: 26th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

Date: November, 6th 2012, 16:00 GMT

Channels: Meeting was held via Skype and Etherpad

Participants

  • Adrian Pohl
  • Karen Coyle
  • Joris Pekel
  • Jim Pitman

Agenda

ORCID launched

“ORCID makes its code available under an open source license, and will post an annual public data file under a CCO waiver for free download.” (Source: http://about.orcid.org/about/what-is-orcid.)

Open Data

  • ORCID provides annual CC0 dump.

Open API

  • To try the open API point your queries to pub.orcid.org ! (Documentation says something else)
  • Query biographies example:
    • curl -H ‘Accept: application/orcid+xml’ http://pub.orcid.org/search/orcid-bio?q=pohl
    • Retrieve bio example: curl -H “Accept: application/orcid+json” “http://pub.orcid.org/0000-0001-9083-7442/orcid-bio”

Open source

Linked Open Data

(Much information was taken from this twitter conversation.)

  • Karen: How can this be intregrated with BibServer
  • Jim: Could OKF pick up and post periodic dumps of ORCID data? And support a BibServer over those dumps?

HathiTrust Lawsuit

See Karen’s blog post on the topic: http://kcoyle.blogspot.de/2012/10/copyright-victories-part-ii.html.

  • Judge supports digitization for indexing as a fair use.
  • No decision on orphan works
  • Support for “just in case” digitization to serve sight impaired users
  • Support for digitization for preservation

OKFN labs for cultural activities

  • Background: Restructuring of OKF
  • Projects and tools are now pulled into OKFN labs, which will mainly focus on government and financial data: http://okfnlabs.org/
  • Rather than “orphan” the other projects, there is now another lab in development for those, including Bibserver.
  • Example projects/code and blog posts that woul find their place at this “open culture lab”:
  • Joris, Sam and Etienne Posthumus working on this. Please propose projects to Joris and Sam and they can help.
  • Suggest: organize “code days” for bibliographic data

W3C working group on biblio extension to schema.org

Journal Article Tag Suite (JATS) Standard

MISC

  • May merge some developer lists into one, which are now scattered. openbiblio-dev could be included in this.
  • We talked for a short time about ResourceSync effort to provide standard for syncing web resources: http://www.niso.org/workrooms/resourcesync/

To Dos

  • Adrian will try to find time for a seperate post on ORCID

Posted in minutes, OKFN Openbiblio | Leave a comment

Metadata for over 20 Million Cultural Objects released into the Public Domain by Europeana

Europeana today announced that its dataset comprising descriptions of more than 20 Million cultural objects is from now on openly licensed with Creative Commons’ public domain waiver CC0. From the announcement:

Europeana logo

Opportunities for apps developers, designers and other digital innovators will be boosted today as the digital portal Europeana opens up its dataset of over 20 million cultural objects for free re-use.

The massive dataset is the descriptive information about Europe’s digitised treasures. For the first time, the metadata is released under the Creative Commons CC0 Public Domain Dedication, meaning that anyone can use the data for any purpose – creative, educational, commercial – with no restrictions. This release, which is by far the largest one-time dedication of cultural data to the public domain using CC0 offers a new boost to the digital economy, providing electronic entrepreneurs with opportunities to create innovative apps and games for tablets and smartphones and to create new web services and portals.

Europeana’s move to CC0 is a step change in open data access. Releasing data from across the memory organisations of every EU country sets an important new international precedent, a decisive move away from the world of closed and controlled data.

Thanks to all the people who made this possible! See also Jonathan Gray’s post at the Guardian’s Datablog.

Update 30 September 2012: Actually, it is not true to call this release “by far the largest one-time dedication of cultural data to the public domain using CC0″. In December 2011 two German library networks released their catalog b3kat under CC0 which by then held 22 million descriptions of bibliographic resources. See this post for more information.

Posted in Data, LOD-LAM | Tagged | Leave a comment

Minutes: 25th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

Date: September, 4th 2012, 15:00 GMT

Channels: Meeting was held via Skype and Etherpad

Participants

  • Peter Murray-Rust
  • Naomi Lillie

NB Karen Coyle apologies due to attendance at DublinCore conference

Agenda

As there was just PeterMR and me attending this call, we abandoned any formal agenda and had a very pleasant chat discussing PeterMR’s engagements and the upcoming OKFestival.

PeterMR has been presenting various Bibliographic tools (including BibSoup) at a number of events lately, including VIVO12, and will do so at the upcoming Digital Science 2012 in Oxford. We discussed support for the existing tools we have in the Open Knowledge Foundation, in terms of person-resource and funding, and the importance of BiBServer as an underlying tool for much of the work to be done in and around Open Bibliography and Access.

OKFest is less than 2 weeks away now and there is so much potential here for collaboration and idea generation… We agreed we are very excited and looking forward to meeting the pillars of Open society as well as those brand-new to this world which will only grow in influence and importance. Now is the time to embrace Open!

There were no particular actions, but it was helpful to consider how we can make a difference on the world of bibliography, for OKFN and GLAM institutions in general (ie galleries, libraries, archives and museums).

To join the Open Bibliography community sign up here – you may also be interested in the Open Access Working Group which is closely aligned in its outlook and aims.

Posted in BibServer, event, events, minutes, OKFN Openbiblio | Leave a comment

Final report: JISC Open Bibliography 2

Following on from the success of the first JISC Open Bibliography project we have now completed a further year of development and advocacy as part of the JISC Discovery programme.

Our stated aims at the beginning of the second year of development were to show our community (namely all those interested in furthering the cause of Open via bibliographic data, including: coders; academics; those with interest in supporting Galleries, Libraries, Archives and Museums; etc) what we are missing if we do not commit to Open Bibliography, and to show that Open Bibliography is a fundamental requirement of a community committed to discovery and dissemination of ideas. We intended to do this by demonstrating the value of carefully managed metadata collections of particular interest to individuals and small groups, thus realising the potential of the open access to large collections of metadata we now enjoy.

We have been successful overall in achieving our aims, and we present here a summary of our output to date (it may be useful to refer to this guide to terms).

Outputs

BibServer and FacetView

The BibServer open source software package enables individuals and small groups to present their bibliographic collections easily online. BibServer utilises elasticsearch in the background to index supplied records, and these are presented via the frontend using the FacetView javascript library. This use of javascript at the front end allows easy embedding of result displays on any web page.

BibSoup and more demonstrations

Our own version of BibServer is up and running at http://bibsoup.net, where we have seen over 100 users sharing more than 14000 records across over 60 collections. Some particularly interesting example collections include:

Additionally, we have created some niche instances of BibServer for solving specific problems – for example, check out http://malaria.bibsoup.net; here we have used BibServer to analyse and display collections specific to malaria researchers, as a demonstration of the extent of open access materials in the field. Further analysis allowed us to show where best to look for relevant materials that could be expected to be openly available, and to begin work on the concept of an Open Access Index for research.

Another example is the German National Bibliography, as provided by the German National Library, which is in progress (as explained by Adrian Pohl and Etienne Posthumus here). We have and are building similar collections for all other national bibliographies that we receive.

BibJSON

At http://bibjson.org we have produced a simple convention for presenting bibliographic records in JSON. This has seen good uptake so far, with additional use in the JISC TEXTUS project and in Total Impact, amongst others.

Pubcrawler

Pubcrawler collects bibliographic metadata, via parsers created for particular sites, and we have used it to create collections of articles. The full post provides more information.

datahub collections

We have continued to collect useful bibliographic collections throughout the year, and these along with all others discovered by the community can be found on the datahub in the bibliographic group.

Open Access / Bibliography advocacy videos and presentations

As part of a Sprint in January we recorded videos of the work we were doing and the roles we play in this project and wider biblio promotion; we also made a how-to for using BibServer, including feedback from a new user:

Setting up a Bibserver and Faceted Browsing (Mark MacGillivray) from Bibsoup Project on Vimeo.

Peter and Tom Murray-Rust’s video, made into a prezi, has proven useful in explaining the basics of the need for Open Bibliography and Open Access:

Community activities

The Open Biblio community have gathered for a number of different reasons over the duration of this project: the project team met in Cambridge and Edinburgh to plan work in Sprints; Edinburgh also played host to a couple of Meet-ups for the wider open community, as did London; and London hosted BiblioHack – a hackathon / workshop for established enthusasiasts as well as new faces, both with and without technical know-how.

These events – particularly BiblioHack – attracted people from all over the UK and Europe, and we were pleased that the work we are doing is gaining attention from similar projects world-wide.

Further collaborations

Lessons

Over the course of this project we have learnt that open source development provides great flexibility and power to do what we need to do, and open access in general frees us from many difficult constraints. There is now a lot of useful information available online for how to do open source and open access. Whilst licensing remains an issue, it becomes clear that making everything publicly and freely available to the fullest extent possible is the simplest solution, causing no further complications down the line. See the open definition as well as our principles for more information.

We discovered during the BibJSON spec development that it must be clear whether a specification is centrally controlled, or more of a communal agreement on use. There are advantages and disadvantages to each method, however they are not compatible – although one may become the other. We took the communal agreement approach, as we found that in the early stages there was more value in exposing the spec to people as widely and openly as possible than in maintaining close control. Moving to a close control format requires specific and ongoing commitment.

Community building remains tricky and somewhat serendipitous. Just as word-of-mouth can enhance reputation, failure of certain communities can detrimentally impact other parts of the project. Again, the best solution is to ensure everything is as open as possible from the outset, thereby reducing the impact of any one particular failure.

Opportunities and Possibilities

Over the two years, the concept of open bibliography has gone from requiring justification to being an expectation; the value of making this metadata openly available to the public is now obvious, and getting such access is no longer so difficult; where access is not yet available, many groups are now moving toward making it available. And of course, there are now plenty tools to make good use of available metadata.

Future opportunities now lie in the more general field of Open Scholarship, where a default of Open Bibliography can be leveraged to great effect. For example, recent Open Access mandates by many UK funding councils (eg Finch Report) could be backed up by investigative checks on the accessibility of research outputs, supporting provision of an open access corpus of scholarly material.

We intend now to continue work in this wider context, and we will soon publicise our more specific ideas; we would appreciate contact with other groups interested in working further in this area.

Further information

For the original project overview, see http://openbiblio.net/p/jiscopenbib2; also, a full chronological listing of all our project posts is available at http://openbiblio.net/tag/jiscopenbib2/. The work package descriptions are available at http://openbiblio.net/p/jiscopenbib2/work-packages/, and links to posts relevant to each work package over the course of the project follow:

  • WP1 Participation with Discovery programme
  • WP2 Collaborate with partners to develop social and technical interoperability
  • WP3 Open Bibliography advocacy
  • WP4 Community support
  • WP5 Data acquisition
  • WP6 Software development
  • WP7 Beta deployment
  • WP8 Disruptive innovation
  • WP9 Project management (NB all posts about the project are relevant to this WP)
  • WP10 Preparation for service delivery

All software developed during this project is available on open source licence. All the data that was released during this project fell under OKD compliant licenses such as PDDL or CC0, depending on that chosen by the publisher. The content of our site is licensed under a Creative Commons Attribution 3.0 License (all jurisdictions).

The project team would like to thank supporting staff at the Open Knowledge Foundation and Cambridge University Library, the OKF Open Bibliography working group and Open Access working group, Neil Wilson and the team at the British Library, and Andy McGregor and the rest of the team at JISC.

Posted in BibServer, JISC OpenBib | Tagged , , | 3 Comments

Importing Spanish National Library to BibServer

The Spanish National Library (Biblioteca Nacional de España or BNE) has released their library catalogue as Linked Open Data on the Datahub.

Initially this entry only containd the SPARQL endpoints and not downloads of the full datasets. After some enquiries from Naomi Lillie the entry was updated with links to the some more information and bulk downloads at: http://www.bne.es/es/Catalogos/DatosEnlazados/DescargaFicheros/

This library dataset is particularly interesting as it is not a ‘straightforward’ dump of bibliographic records. This is best explained by Karen Coyle in her blogpost.

For a BibServer import,  the implications are that we have to distinguish the types of record that is read by the importing script and take the relevant action before building the BibJSON entry. Fortunately the datadump was made as N-Triples already, so we did not have to pre-process the large datafile (4.9GB) in the same manner as we did with the German National Library dataset.

The Python script to perform the reading of the datafile can be viewed at https://gist.github.com/3225004

A complicating matter from a data wrangler’s point of view is that the field names are based on IFLA Standards, which are numeric codes and not ‘guessable’ English terms like DublinCore fields for example. This is more correct from an international and data quality point of view, but does make the initial mapping more time consuming.

 So when mapping a data item like https://gist.github.com/3225004#file_sample.nt we need to dereference each fieldname and map it to the relevant BibJSON entry.

As we identify more Linked Open Data National Bibliographies, these experiments will be continued under the http://nb.bibsoup.net/ BibServer instance.

Posted in BibServer, Data, JISC OpenBib, OKFN Openbiblio | Tagged , , | Leave a comment

Minutes: 24th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

Date: August, 7th 2012, 15:00 GMT

Channels: Meeting was held via Skype and Etherpad

Participants

  • Jim Pitman
  • Karen Coyle
  • Naomi Lillie

Agenda

JISC Open Biblio 2 project coming to close

  • Blog-post write-up of project being finished this week, Mark MacGillivray reporting back to JISC in late September
  • Further funding being explored mainly in terms of related work

ISBNdb http://isbndb.com/

  • Similar to BibJSON
  • Uses other sources, has no explicit license / restrictions
  • API will give 500 returns a day
  • Jim’s example: http://isbndb.com/d/person/pitman_jim/books.html
    • author identity is not working very well – this example contains a book that isn’t Jim’s
  • There is no record without an ISBN – seems to be no information from pre-1970
  • Claims to have 7million books but only 2m authors – FAQs state that records are gleaned from different libraries so duplication is likely
  • Open Library is possibly a better source

Karen’s most recent blog: http://kcoyle.blogspot.co.uk/2012/07/fair-use-deja-vu.html

  • “The argument that Google has made from the beginning of its book scanning project is that copying for the purpose of providing keyword access to full texts is fair use”
    • HathiTrust has been in court to defend the storing and searching of metadata

Actions:

Posted in JISC OpenBib, minutes, OKFN Openbiblio | Leave a comment

Nature’s data platform strongly expanded

Nature has largely expanded its Linked Open Data platform that was launched in April 2012. From today’s press release:

Logo of the journal Nature used in its first issue on Nov. 4, 1869

“As part of its wider commitment to open science, Nature Publishing Group’s (NPG) Linked Data Platform now hosts more than 270 million Resource Description Framework (RDF) statements. It has been expanded more than ten times, in a growing number of datasets. These datasets have been created under the Creative Commons Zero (CC0) waiver, which permits maximal use/reuse of this data. The data is now being updated in real-time and new triples are being dynamically added to the datasets as articles are published on nature.com.

Available at http://data.nature.com, the platform now contains bibliographic metadata for all NPG titles, including Scientific American back to 1845, and NPG’s academic journals published on behalf of our society partners. NPG’s Linked Data Platform now includes citation metadata for all published article references. The NPG subject ontology is also significantly expanded.

The new release expands the platform to include additional RDF statements of bibliographic, citation, data citation and ontology metadata, which are organised into 12 datasets – an increase from the 8 datasets previously available. Full snapshots of this data release are now available for download, either by individual dataset or as a complete package, for registered users at http://developers.nature.com.

This is exciting, especially the commitment to real-time updates is a great move and shows how serious Linked Open Data becomes in general and in particular in the realm of bibliographic data. Also, Nature now uses the Data Hub and has registered the data seperated into several datasets.

Posted in Data, News, Semantic Web | Leave a comment