German National Library publishes 11.5 Million MARC records from national bibliography

In January 2012 the German National Library (DNB) already started publishing the national bibliographc as linked data under a CC0 license. Today, the DNB announced that it also publishes the national bibliography up to the year 2011 as MARC data. The full announcement reads as follows (quick translation by myself):

“All of German National Library’s title data which are offered under a Creative Commons Zero (CC0) license for free use are now available gratis as MARC 21 records. In total, these are more than 11.5 Million title records.

Currently title data up to bibliography year 2011 is offered under a Creative Commons Zero license (CC0). For using the data a registration free of charge is necessary. Title data of the current and the previous year are subject to charge. The CC0 data package will be expanded by one bibliography year each first quarter of a year.

It is planned to provide free access under CC0 conditions to all data in all formats in mid-2015. The German National Library thus takes into account the growing need for freely available metadata.”

As the MARC data contains much more information than the linked data (because not all MARC fields are currently mapped to RDF) this is good news for anybody who is interested in getting all the information available in the national bibliography. As DNB still makes money with selling the national bibliography to libraries and other interested parties it won’t release all bibliographic data until the present day into the public domain. It’s good to see that there already exist plans to switch to a fully free model in 2015.

See also Lars Svensson: Licensing Library and Authority Data Under CC0: The DNB Experience (pdf).

Posted in Data | Tagged | 1 Comment

Discovery silos vs. the open web

Bibliographic data that is not openly available on the web is harmful. In this post I’d like to point to a recent incident that demonstrates this: a correspondence between the board of the Orbis Cascade Alliance (“a consortium of 37 academic libraries in Oregon, Washington, and Idaho serving faculty and the equivalent of more than 258,000 full time students” (source)), Ex Libris, and EBSCO. The issue argued about is the provision of metadata describing content provided by EBSCO to Ex Libris’ discovery tool Primo. Thanks to the Orbis Cascade Alliance, the conversation is documented on the web. (I wish, more institutions would transparently document their negotiations with vendors as well as the resulting contracts…)

1. What is a discovery tool, anyway?

But first, for those who aren’t familiar with “next-generation discovery tools”, here is a short explanation of what these services are all about:

Such a discovery tool provides a single interface that enables discovery of (almost) any resource a library provides access to. These are resources from its physical and electronic collections as well as electronic resources it has licensed and, furthermore, resources from openly available collections. Discovery tools are based upon a unified customized index that comprises the library’s catalog data and metadata (+ sometimes full text) from publishers and bibliographic databases. In order to pre-index content metadata and/or the fulltext, providers of discovery tools enter into agreements with publishers and aggregators. Libraries spend quite some money on purchasing a discovery service. These services are very popular. As of today Marshall Breeding’s lib-web-cats directory (library web sites and catalogs) records in sum more than 1250 libraries using one of the four leading discovery systems: Serials Solutions’ Summon, EBSCO Discovery Service (EDS), Ex Libris’ Primo and OCLC’s WorldCat Local.

2. An overview over the “EBSCO and Ex Libris slapfight”

So, what has been going on between Orbis Cascade Alliance, EBSCO and Ex Libris? In short (thanks to the summaries provided in this thread entitled “EBSCO and Ex Libris slapfight”):

EBSCO is offering both content and a discovery tool EDS. Ex Libris would like to include at least metadata for this content in its Primo discovery layer, so that users at libraries who subscribe to the EBSCO products can find it using the library’s Primo instance. EBSCO won’t provide any data to Ex Libris, only access to the EDS API so that their content is best/only accessed via EBSCO’s own discovery tool EDS.

Here’s a more detailied overview over what happened. (You may skip this if the summary above is enough for you and continue at paragraph 3.)

May 2, 2013, Letter from Orbis Cascade Alliance to Ex Libris and EBSCO

Board of Orbis Cascade Alliance writes to Ex Libris and EBSCO expressesing disappointment over the companies’ “failure to make EBSCO academic library content seamlessly and fully available via Ex Libris discovery services”. The Orbis Cascade Alliance estimates their payments to both companies for the coming five years to 30 Million dollars and says that – if this issue is not resolved – it “will be required to reconsider the shape and scope of future business with EBSCO and Ex Libris”.

May 6, 2013, Ex Libris response to Alliance Board

Ex Libris agrees that this problem is unacceptable and blames EBSCO for not providing metadata to Ex Libris anymore. After EBSCO would agree in 2009 on providing Ex Libris with “comprehensive metadata, including subject headings, for several of the key EBSCO databases”, they changed their policy in 2010 when EBSCO launched its own discovery services EDS. From then on, according to Ex Libris, they “made EDS Discovery a requirement for users who wanted to continue this type of access. They decided to no longer enable their content for indexing in Primo and instead required that Primo users access the content only via an API.”

Ex Libris calls for an agreement with EBSCO “that would provide Primo customers with the content that EBSCO itself receives from external information providers – the content you and other libraries subscribe to, for which you should have access from your discovery platform ofchoice.” Ex Libris states that it has “in place many such agreements with other content providers”.

May 8, 2013, EBSCO response to Alliance Board

EBSCO mentions that – while there is no agreement with Ex Libris on providing data for Ex Libris’ discovery service – they have established such agreements with several other discovery service providers including OCLC and Serials Solutions. The existing agreements clarify the use of the EDS (EBSCO Discovery Service) API to make EBSCO content available via a discovery service. EBSCO’s view is that “an API solution is superior to a solution that relies strictly on metadata for several reasons, including the fact that we do not have the rights to provide (to Ex Libris or any third party) all of the content to that we feel is necessary for a quality user experience.”

In fact, libraries don’t have the right to make the content they already licensed discoverable via Primo. The reasons named by EBSCO are that (a) Primo’s relevancy ranking will “not take advantage of the value added elements of their products” and (b) users wouldn’t have an incentive to use the original databases as they think all the content is available via Primo. From this, the “user experience” would suffer. Giving optimization of the user experience as the reason, EBSCO tries to have greatest possible exclusive control over content and metadata provided by them.

May 9, 2013, Alliance Board response to EBSCO and Ex Libris

The Orbis Cascade Alliance responds to the companies’ letters:

“While these letters illustrate the nature of this continuing impasse, they do nothing to address a remarkable and unacceptable disservice to your customers. (…) Ultimately we face a business decision. The Orbis Cascade Alliance is now actively investigating options and will make decisions that may move us away from your products in order to better serve our faculty, students, and researchers. Again, we urge EBSCO and Ex Libris to quickly resolve this issue.”

May 14, 2013, Ex Libris Open Letter to the Library Community

But obviously, EBSCO and Ex Libris are far away from “resolving this issue”. In the next step, Ex Libris responds to EBSCO’s response with a “point-by-point analysis” to refute the claims made by EBSCO.

I had some problems with the terminology, so here are a few words about language usage: Ex Libris differs between “index-based search”, i.e. search over one central index just like a “true next generation discovery service” does it, and “API-based search”. This is a bit confusing as APIs are often based on an index so that – strictly speaking – “API-based search” and “index-based search” don’t necessarily exclude each other. But it makes a difference if a service like Primo, that is based on one index, has to use external APIs so that the service actually becomes a “metasearch tool” instead of a “true next generation discovery service”. (Talking about terminology – as I have the feeling not all people are using it in the same way as I do: I differ between content and (meta)data. In short and in this context, content is the thing scholars produce and read while metadata is the data that describes the content.)

However, in short, Ex Libris says EBSCO would choose “not to share the content that they do have the rights to share” although this is the content that the respective library already has payed for. Ex Libris accordingly accuses EBSCO of wishing “to control the ranking of its content, which is possible through the API to EDS they require”.

Interestingly, after Ex Libris state that EBSCO wants to control the ranking of its content it reads:

“EBSCO clearly believes that end-users … prefer to search database silos. This runs counter to what both end-users and libraries wish to achieve with a library-based discovery service.”

As discovery services are silos themselves – only bigger than the old silos – this is actually also an argument against Primo, EDS, Summon etc.

In the end of this letter Ex Libris proclaims itself as the libraries’ comrade-in-arms who is fighting for their interests:

“We stand with you and continue to believe that together we can bring change such that EBSCO databases, whose rights EBSCO determines, are available to any EBSCO customer through the discovery service of choice.”

Indeed, there are overlapping interests of libraries and Ex Libris. But obviously Ex Libris is primarily following its own interest trying to get its own discovery service populated with the relevant data. Libraries should demand more than having the ability to chose one of several commercial products. Rather, anybody interested and equipped with the necessary amount of resources and technical capabilities should be able to get the metadata for academic publications that are accessible (with or without a toll) on the web in order to build their own discovery indexes.

There are already a lot of open bibliographic data sets out there. But mostly, this data comes from libraries and related institutions, you won’t find much data from the publishers’ side. What we need is more and more publishers publishing their metadata, citation data etc. openly on the web, at best in the way Nature Publishing Group is doing it.

3. Rejecting silos, encouraging open bibliographic data

The conflict between EBSCO and Ex Libris is just another indicator of how important it is to move away from closed content and discovery silos to web-integrated, openly available bibliographic data. At least it would facilitate to get hold of the metadata for content provided via the web. Although bibliographic data for the majority of resources published in the past as print-only wouldn’t be covered, this constitutes a future-proof way of providing bibliographic data on the web.

Rurik Greenall, (who wrote about NTNU’s LOD activities on openbiblio.net some time ago and from whom I took the “future-proof” terminology) recently summed it up nicely in his alternative “for the hard of understanding” version of his talk “Making future-proof library content for the Web” at this year’s ELAG conference. Libraries and publishers (and anybody else using the web as publication platform) should acknowledge how the web works and make their content and data available using persistent HTTP-URIs as identifiers and serving content and metadata using standards like HTML, PDF/A, TIFF+XMP, JPEG+XMP, JSON-LD, RDFa.

In regard to discovery tools, Rurik provides the following conclusion of his talk packed as a rhetorical question:

“If you pay money to a content provider that is also a metadata provider and then buy a search index from them what motivation do they have to present their content in a findable way on the web?”

Accordingly, Rurik ends his talk with stating the need that librarians should ask themselves:

  • Do we deliver content to the web in the described way? (i. e. using persistent HTTP-URIs as identifiers and open standards like HTML, PDF/A, TIFF+XMP. JSON-LD etc.)
  • Do we subscribe to a service that does the exact opposite?

Unfortunately, many librarians – especially on the management level – are not aware of the importance of applying web standards and publishing open data. A lot of persuasion has to be done until this thinking becomes part of a broader mindset and non-open forms of publishing metadata and providing discovery tools won’t pay off anymore.

4. Guiding the way?

Carl Grant last week also published a blog post on the topic worth reading. I agree with him when he says “we need to define the guidelines under which we’ll buy products and services dealing with content, content enhancements, and discovery services.” The International Group of Ex Libris Users (Igelu) yesterday did a first step to get to such guidelines proposing a clause libraries should add to their contracts with content providers. Unfortunately, this proposal doesn’t go very far as it would only enable the indexing of “citation metadata (including without limitations subject headings and keywords), abstract and full-text, all as available” by “Discovery Service Providers”. Nowhere it is made clear who falls under this concept of “Discovery Service Provider”. For example, it isn’t clear at all if a library consortium wanting to index rich metadata that its members have subscribed to also is regarded a “Discovery Service Provider”.

If you advocate open bibliographic data you should object the notion that bibliographic data be made available only to the exclusive club of “Discovery Service Providers”. Instead, anybody interested in providing a service, running some analytics or doing whatever else with that data should be able to collect it. It’s up to the advocates of open bibliographic data to participate in the development of guidelines for licensing content and discovery services.

Posted in vendors | 5 Comments

Minutes: 28th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

Date: February, 5th 2013, 16:00 GMT

Channels: Meeting was held via Skype and Etherpad

Participants

  • Adrian Pohl
  • Karen Coyle
  • Tom Johnson
  • Tom Morris
  • On the Etherpad:
    • Peter Murray-Rust
    • Mark McGillivray

Agenda

  • As there were two new participants to the meeting (who already engaged in discussions on the mailing list though) attended the meeting everybody introduced themselves. The “new” participants were:
    • Tom Morris: “Tom Morris is the top external data contributor to Freebase and has contributed more than 1.6 million facts. He’s been a member of the Freebase community for several years. When not hacking on Freebase, Tom is an independent software engineering and product management consultant.” (taken from here, shortened and updated
    • Tom Johnson: “Thomas Johnson is Digital Applications Librarian at Oregon State University Libraries, where he works on digital curation, scholarly publication, and related metadata and software issues.

Bibframe and data licensing

  • Adrian started a discussion on the bibframe list, see here.
  • Karen: It isn’t clear to me how BIBFRAME will be documented, and whether that documentation will be sufficient to process data. Note that RDA (the cataloging rules) is not freely available, therefore if BIBFRAME does develop for RDA there may be conflicts relating to text such as term definitions.
    • This adresses licensing of bibframe spec, not the bibliographic data but may be a problem in the future if Bibframe re-uses content from the RDA spec.
  • Tom Morris: Licensing policy seems to be orthogonal to modelling process
  • Conclusion: We’ll wait as a working group and not push the LoC further towards open data.
  • Tom Morris: We should think about lobbying for making the process more open.
  • Tom Morris: German National Library and other early experimenters of bibframe should get up their code on github to bring the development forward

Bibliographic Extension for schema.org (schemabibex)

  • See minutes of last meeting for background information.
  • The work is moving forward to create more schema.org properties for bibliographic data — but so far not including journal articles
  • Library view point predominates at schemabibex group, scientists’ view point isn’t represented
  • Karen: Somebody from the scientific community should join schemabibex or start seperate effort. <– Maybe people from scholarlyhtml?

NISO Bibliographic Meeting

  • http://www.niso.org/topics/tl/BibliographicRoadmap/
  • NISO has a grant to hold a meeting of "interested parties" relating to bibliographic data.
  • Goes back to effort of Karen Coyle and another person to include other producers of bibliographic data than libraries (publishers, scientists etc.) in developments of future standards for bibliographic data (like Bibframe).
  • See also the thread on the openbiblio list. tfmorris: As much of the information as possible should be published online.
  • Meeting will be held in March or April in Washington D.C.
  • Interested parties can participate in the initial meeting but there's no/little funding. (See this email for the proposed dates of the meeting.
  • "We are planning to have a live-stream of the event, presuming there is sufficient bandwidth at the meeting site."

BiblioHackfests

  • Peter Murray-Rust wrote before the meeting: "I'd like to run a hackfest (in AU) later this month and make Bib an important aspect. Can we pull together a "hacking kit" for such an even (e.g. examples of BibJSON, some converters, a simple BibSoup, etc."
    • Mark McGillivray responded: "yes: I will write a blog post that explains bibsoup a bit more, and we could use a google spreadsheet for simple collection of records."

BibJSON

  • Tom Morris had two questions regarding BibJSON which and Mark provided some answers on the etherpad.
  • Q: What is being done to promoted adoption?
    • MM says: "_I and others continue to use bibjson and promote it on our projects. it is now being used by the open citations project and there will be updates to bibjson.org soon with further recommendations – mostly around how to specify provenance in a bibjson record. Also we have agreed with crossref for them to output bibjson – it needs some fixes to be correct, but is just about there.
  • Q: What tool support is available? (Mendeley, Zotero, converters, etc)
    • MArk says: "The translators are currently unavailable – they will soon be put up at a separate url for translating files to bibjson which can then be used in bibsoup. Mendeley, Zotero etc can all output bib collections in formats that we can already convert, so there is support in that sense. Separating out the translators will also make it easier for people to implement their own."
  • Tim morris: There's PR value in having BibJSON listed on the https://github.com/zotero/translators
  • Ways of promoting BibJSON:
    • Articles: Tom Johnson published an article on BibJSON application in code4lib journal: http://journal.code4lib.org/articles/7949
    • Talks: e.g. at code4lib (Tom Johnson will be there and might give a lightning talk mentioning BibJSON.),
    • Adoption: CrossRef would be a great addition. Need more services like Mendeley, Zotero, Open Library, BibSonomy etc. to support BibJSON (input/output)
  • Tom Johnson asks: What is the motivation to provide BibJSON output?

Open Library

  • Speaking about BibJSON adoption we camte to talking about what will happen to the Open Library. Karen gave a short summary of what are the future plans for Open Library:
    • Open Library currently has no assigned staff resources. Open Library is being integrated into the whole Internet Archive system and may cease using the current infogami platform. It isn't clear if the same UI will be available, nor if there will be any further development in terms of features such as APIs.
    • No batches of records (LC books records or Amazon records) have been loaded since mid-2012.
    • Tom Morris is primarily interested in the data and the process to reconcile it etc. but he also emphasizes the value of the brand and the community.
    • Karen: infogami is interesting as a flexible development platform that sits on a triple store: http://infogami.org/
    • Tom Johnson: What can we do regarding Open Library?
      • Karen: Set up a mirror?
      • Make records for free ebooks available as MARC so that libraries can integrate these into their catalogue. <– Tom Morris would help with that.

Public Domain Books/authors

Posted in minutes, OKFN Openbiblio | Leave a comment

A revamp of bibserver and bibsoup

Since our work last year on the JISC Open Bibliography 2 project, I have been thinking about the approach we took to building a tool that people might use; some of that approach, I think, was wrong. So, I have recently been working on some changes and pushed a new version of bibserver to the repository, in a branch called bibwiki.

Also today, the service running at http://bibsoup.net has been rebooted to run the new branch. One of the downsides of this is that user accounts and data that existed on the old system are no longer available; because there were some issues with the old system anyway, it was giving errors for a few of the more recent attempts to upload large datasets, so I decided to wipe the slate clean and start again from scratch. However, if you had any particular collections in there that you need to have recovered, please get in touch via the openbiblio-dev mailing list and I will recover them for you.

Now, on to the details of what has changed, and why. Let’s start with the why.

Why change it?

One of the original requirements of bibserver was that it would present a personally curated collection of bibliographic records; this extended not only to the curation of the collection, but to the curation of records within that collection. Unfortunately, this made every collection an island – a private island, with guards round the edges; not so good for building open knowledge or community. Also, we put too much emphasis on legacy data and formats; whilst there is of course value in old standards like bibtex, and in historical records, giving up the flexibility of the present for the sake of the past is the opposite of progress. Instead we should take the best bits of what we had and improve on them, then get our historical content into newer, more useful forms.

Because of these issues, it seems sensible therefore to try a more connected, more open, more modern approach. So, what I have done is to remove the concept of “ownership” of a record and to remove the ties to legacy data formats or sources. Instead what we now have is a tool into which we can dump bibJSON data, and via which we can build personally curated collections of shared bibliographic records.

So what has changed?

you can only upload bibJSON

Whilst the conversion tools we wrote to process data from formats such as bibtex or RIS into bibJSON are useful and will be utilised elsewhere, they are not part of the core functionality of bibserver. They are a way to get from the past into the present, and once you are here, you should forget about the past and get on with the future. So your upload is one-off, and cares not from whence it came.

You can edit records, but so can anybody else

Does what it says on the tin. For now, editing is only via clunky edit of the JSON itself, but this can have a nice UI added later.

You can tag any record with anything, but so can anybody else

Anyone can tag a record with a useful term; anyone can remove a tag.

You can still build your own collection

You can still create your own collection and curate it as you see fit, and other people will not be able to change what records are in that collection; but the records themselves are still editable by anyone. Seems scary? Well, yes. But get used to it. It works for wikipedia. (Which is why I called the new branch bibwiki.)

You can’t visualise facets anymore

You used to be able to make a little bubble picture out of the facet filters down the left hand side. Now you can’t. It was a bit incongruously located, so this functionality is being hived off into a more specifically useful form.

You can search for any record and add it to your collection

Anything that is on the bibserver instance can be found by anyone using the search box, then you can add it to one of your collections. However, searching for everything has limited functionality and does not offer filters. This is because one of the constraints of scaling up to large datasets is that filtering is expensive; so now, you have simple search across everything, then nice complex filtered search on the things you care about. Best of both worlds with minimal compromise.

Simplistic record deduping

Where a record appears to have the same title-and-authors string on import as another record already in the database, it will try to squish them together. The important point here though is that the functionality exists now in to deduplicate things via various methods, and there is no longer a constraint to maintain unique copies of things, so we can get on and build those methods.

Exciting. So, what next?

Rework the parsers into a stand-alone service

The parsers from bibtex, RIS, etc should be built out as a simple service that we can run where you hit the webpage, give it your file (or file URL), and it pings you when it has done the conversion with a link for you to get your bibJSON from. This should work with parser plugins sort-of functionality, so that we can run it with the parsers we have, and other people can run it with their own if they wish. Then we can boot up a translation service at http://translate.bibsoup.net.

This is the most important next step, as without it not many people will be able to upload records.

Upload some bibliographic metadata

There are numerous sources of biblio metadata we have collected over the years, and some of these will be uploaded into bibsoup for people to use.

Also, there is potential to run specific instances of bibsoup for people who need them – although, overall, it is probably more sensible to keep them all together and distinguish via collections.

Bugfix

This is basically a beta 2 implementation. Please go and use the new system at http://bibsoup.net, and get back to the mailing list with the usual issues.

Build up some deduplication maybe with pybossa

Now that we can edit records and find similar ones, we can also do interesting things like enable users to tag records that are about the same thing. We can also run queries to find similar records and expose that data perhaps through a tool like pybossa, to get crowd-sourced deduplication on the go.

Rewrite the tests

All the tests that were in the original branch have yet to be copied over. A lot of them will become redundant. So if you like tests (and we should have them), then get involved with porting them over / writing new ones

Update the docs

The documentation needs to be updated, a lot of it still refers to the old branch. Although, a fair bit of it is still relevant.

Decide how to manage the code and bibsoup in the future

What I have done here are some fairly large changes to our original aims; it is possible that not everybody will like this. However, the great thing about code repositories is that we have versioning, so anyone can use any version of the software. My changes are still in a branch, so we can either merge these into the main, or fork them off to a separate project if necessary. Unless there are reasons against merging into main are given, that will be the course taken once the parsers have been hived off.

Posted in BibServer | Tagged , , | Leave a comment

Minutes: 27th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

Date: January, 8th 2013, 16:00 GMT

Channels: Meeting was held via Skype and Etherpad

Participants

  • Adrian Pohl
  • Peter Murray-Rust
  • Richard Wallis

Agenda

Schemabib Extension group update

  • Links:
  • W3C community and business group, started by Richard Wallis (OCLC) in September 2012
  • Conference meeting once a month
  • Idea: Get consensus across the bibliographic community about how to extend schema.org.
  • Lightweight approach, should not compete with MARC
  • Most people interested in bibliodata come from the library community. Richard tried to extend the group to other people (publishers, scholars etc.).
  • Background: OCLC publishing Linked Data in worldcat.org using schema.org vocabulary. schema.org missed properties
  • In the end: Publish extension proposal to the public-vocabs list
  • Peter comments on schema.org: schema.org is going to work because its built by people who know how the web works
  • Currently discussion about the concept of work and instances; FRBR comes up but such a model wouldn’t make it into schema.org
  • Richard: It makes sense to publish schema.org alongside BibFrame or RDA.
  • Peter: Talking to Mark McGillivray might make sense to find out how schema.org bibdata can relate to BibJSON and the accompanying tools.

Bibframe draft data model

GOKb (Global Open Knowledgebase)

Adrian heard about this project but all he could find on the web about it was litte information:

“Kuali OLE, one of the largest academic library software collaborations in the United States, and JISC, the UK’s expert on digital technologies for education and research, announce a collaboration that will make data about e-resources—such as publication and licensing information—more easily available.

Together, Kuali OLE and JISC will develop an international open data repository that will give academic libraries a broader view of subscribed resources.
The effort, known as the Global Open Knowledgebase (GOKb) project, is funded in part by a $499,000 grant from The Andrew W. Mellon Foundation. North Carolina State University will serve as lead institution for the project.

GOKb will be an open, community-based, international data repository that will provide libraries with publication information about electronic resources. This information will support libraries in providing efficient and effective services to their users and ensure that critical electronic collections are available to their students and researchers.” from http://gokb.org/post/25021222983/gobkpressrelease

GOKb is … focused on global-level metadata about e-resources with the goal of supporting management of those e-resources across the resource lifecycle. GOKb does not aspire to replace current vendor-provided KB products. But it does aspire to make good data available to everybody, including existing KBs, and to provide an open and low-barrier way for libraries to access this data. Our goal is that GOKb data is permeates the KB ecosystem so that all library systems, whether ILS, ERM, KB or discovery, will have better quality data about electronic collections than they do today.” From http://kualiole.tumblr.com/post/32942331929/bib-data-is-now-more-open-what-about-knowledge-base

  • The oparticipants didn’t know much more about this initiative. Adrian will try to find out more for upcoming meetings.

Other

  • Peter briefly informed about some interesting developments: *Open citations: http://opencitations.wordpress.com/ (David Shotton, Oxford, Uk)
    • Hargreaves report: UK government says it’s legal toc mine content. See Peter’s post at [http://blogs.ch.cam.ac.uk/pmr/2012/12/21/opencontentmining-massive-step-forward-come-and-join-us-in-the-uk/](http://blogs.ch.cam.ac.uk/pmr/2012/12/21/opencontentmining-massive-step-forward-come-and-join-us-in-the-uk/]
    • Pubcrawler
    • Crossref biblio/citation data

Posted in minutes, OKFN Openbiblio | Leave a comment

Minutes: 26th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

Date: November, 6th 2012, 16:00 GMT

Channels: Meeting was held via Skype and Etherpad

Participants

  • Adrian Pohl
  • Karen Coyle
  • Joris Pekel
  • Jim Pitman

Agenda

ORCID launched

“ORCID makes its code available under an open source license, and will post an annual public data file under a CCO waiver for free download.” (Source: http://about.orcid.org/about/what-is-orcid.)

Open Data

  • ORCID provides annual CC0 dump.

Open API

  • To try the open API point your queries to pub.orcid.org ! (Documentation says something else)
  • Query biographies example:
    • curl -H ‘Accept: application/orcid+xml’ http://pub.orcid.org/search/orcid-bio?q=pohl
    • Retrieve bio example: curl -H “Accept: application/orcid+json” “http://pub.orcid.org/0000-0001-9083-7442/orcid-bio”

Open source

Linked Open Data

(Much information was taken from this twitter conversation.)

  • Karen: How can this be intregrated with BibServer
  • Jim: Could OKF pick up and post periodic dumps of ORCID data? And support a BibServer over those dumps?

HathiTrust Lawsuit

See Karen’s blog post on the topic: http://kcoyle.blogspot.de/2012/10/copyright-victories-part-ii.html.

  • Judge supports digitization for indexing as a fair use.
  • No decision on orphan works
  • Support for “just in case” digitization to serve sight impaired users
  • Support for digitization for preservation

OKFN labs for cultural activities

  • Background: Restructuring of OKF
  • Projects and tools are now pulled into OKFN labs, which will mainly focus on government and financial data: http://okfnlabs.org/
  • Rather than “orphan” the other projects, there is now another lab in development for those, including Bibserver.
  • Example projects/code and blog posts that woul find their place at this “open culture lab”:
  • Joris, Sam and Etienne Posthumus working on this. Please propose projects to Joris and Sam and they can help.
  • Suggest: organize “code days” for bibliographic data

W3C working group on biblio extension to schema.org

Journal Article Tag Suite (JATS) Standard

MISC

  • May merge some developer lists into one, which are now scattered. openbiblio-dev could be included in this.
  • We talked for a short time about ResourceSync effort to provide standard for syncing web resources: http://www.niso.org/workrooms/resourcesync/

To Dos

  • Adrian will try to find time for a seperate post on ORCID

Posted in minutes, OKFN Openbiblio | Leave a comment

Metadata for over 20 Million Cultural Objects released into the Public Domain by Europeana

Europeana today announced that its dataset comprising descriptions of more than 20 Million cultural objects is from now on openly licensed with Creative Commons’ public domain waiver CC0. From the announcement:

Europeana logo

Opportunities for apps developers, designers and other digital innovators will be boosted today as the digital portal Europeana opens up its dataset of over 20 million cultural objects for free re-use.

The massive dataset is the descriptive information about Europe’s digitised treasures. For the first time, the metadata is released under the Creative Commons CC0 Public Domain Dedication, meaning that anyone can use the data for any purpose – creative, educational, commercial – with no restrictions. This release, which is by far the largest one-time dedication of cultural data to the public domain using CC0 offers a new boost to the digital economy, providing electronic entrepreneurs with opportunities to create innovative apps and games for tablets and smartphones and to create new web services and portals.

Europeana’s move to CC0 is a step change in open data access. Releasing data from across the memory organisations of every EU country sets an important new international precedent, a decisive move away from the world of closed and controlled data.

Thanks to all the people who made this possible! See also Jonathan Gray’s post at the Guardian’s Datablog.

Update 30 September 2012: Actually, it is not true to call this release “by far the largest one-time dedication of cultural data to the public domain using CC0″. In December 2011 two German library networks released their catalog b3kat under CC0 which by then held 22 million descriptions of bibliographic resources. See this post for more information.

Posted in Data, LOD-LAM | Tagged | Leave a comment

Minutes: 25th Virtual Meeting of the OKFN Working Group for Open Bibliographic Data

Date: September, 4th 2012, 15:00 GMT

Channels: Meeting was held via Skype and Etherpad

Participants

  • Peter Murray-Rust
  • Naomi Lillie

NB Karen Coyle apologies due to attendance at DublinCore conference

Agenda

As there was just PeterMR and me attending this call, we abandoned any formal agenda and had a very pleasant chat discussing PeterMR’s engagements and the upcoming OKFestival.

PeterMR has been presenting various Bibliographic tools (including BibSoup) at a number of events lately, including VIVO12, and will do so at the upcoming Digital Science 2012 in Oxford. We discussed support for the existing tools we have in the Open Knowledge Foundation, in terms of person-resource and funding, and the importance of BiBServer as an underlying tool for much of the work to be done in and around Open Bibliography and Access.

OKFest is less than 2 weeks away now and there is so much potential here for collaboration and idea generation… We agreed we are very excited and looking forward to meeting the pillars of Open society as well as those brand-new to this world which will only grow in influence and importance. Now is the time to embrace Open!

There were no particular actions, but it was helpful to consider how we can make a difference on the world of bibliography, for OKFN and GLAM institutions in general (ie galleries, libraries, archives and museums).

To join the Open Bibliography community sign up here – you may also be interested in the Open Access Working Group which is closely aligned in its outlook and aims.

Posted in BibServer, event, events, minutes, OKFN Openbiblio | Leave a comment

Final report: JISC Open Bibliography 2

Following on from the success of the first JISC Open Bibliography project we have now completed a further year of development and advocacy as part of the JISC Discovery programme.

Our stated aims at the beginning of the second year of development were to show our community (namely all those interested in furthering the cause of Open via bibliographic data, including: coders; academics; those with interest in supporting Galleries, Libraries, Archives and Museums; etc) what we are missing if we do not commit to Open Bibliography, and to show that Open Bibliography is a fundamental requirement of a community committed to discovery and dissemination of ideas. We intended to do this by demonstrating the value of carefully managed metadata collections of particular interest to individuals and small groups, thus realising the potential of the open access to large collections of metadata we now enjoy.

We have been successful overall in achieving our aims, and we present here a summary of our output to date (it may be useful to refer to this guide to terms).

Outputs

BibServer and FacetView

The BibServer open source software package enables individuals and small groups to present their bibliographic collections easily online. BibServer utilises elasticsearch in the background to index supplied records, and these are presented via the frontend using the FacetView javascript library. This use of javascript at the front end allows easy embedding of result displays on any web page.

BibSoup and more demonstrations

Our own version of BibServer is up and running at http://bibsoup.net, where we have seen over 100 users sharing more than 14000 records across over 60 collections. Some particularly interesting example collections include:

Additionally, we have created some niche instances of BibServer for solving specific problems – for example, check out http://malaria.bibsoup.net; here we have used BibServer to analyse and display collections specific to malaria researchers, as a demonstration of the extent of open access materials in the field. Further analysis allowed us to show where best to look for relevant materials that could be expected to be openly available, and to begin work on the concept of an Open Access Index for research.

Another example is the German National Bibliography, as provided by the German National Library, which is in progress (as explained by Adrian Pohl and Etienne Posthumus here). We have and are building similar collections for all other national bibliographies that we receive.

BibJSON

At http://bibjson.org we have produced a simple convention for presenting bibliographic records in JSON. This has seen good uptake so far, with additional use in the JISC TEXTUS project and in Total Impact, amongst others.

Pubcrawler

Pubcrawler collects bibliographic metadata, via parsers created for particular sites, and we have used it to create collections of articles. The full post provides more information.

datahub collections

We have continued to collect useful bibliographic collections throughout the year, and these along with all others discovered by the community can be found on the datahub in the bibliographic group.

Open Access / Bibliography advocacy videos and presentations

As part of a Sprint in January we recorded videos of the work we were doing and the roles we play in this project and wider biblio promotion; we also made a how-to for using BibServer, including feedback from a new user:

Setting up a Bibserver and Faceted Browsing (Mark MacGillivray) from Bibsoup Project on Vimeo.

Peter and Tom Murray-Rust’s video, made into a prezi, has proven useful in explaining the basics of the need for Open Bibliography and Open Access:

Community activities

The Open Biblio community have gathered for a number of different reasons over the duration of this project: the project team met in Cambridge and Edinburgh to plan work in Sprints; Edinburgh also played host to a couple of Meet-ups for the wider open community, as did London; and London hosted BiblioHack – a hackathon / workshop for established enthusasiasts as well as new faces, both with and without technical know-how.

These events – particularly BiblioHack – attracted people from all over the UK and Europe, and we were pleased that the work we are doing is gaining attention from similar projects world-wide.

Further collaborations

Lessons

Over the course of this project we have learnt that open source development provides great flexibility and power to do what we need to do, and open access in general frees us from many difficult constraints. There is now a lot of useful information available online for how to do open source and open access. Whilst licensing remains an issue, it becomes clear that making everything publicly and freely available to the fullest extent possible is the simplest solution, causing no further complications down the line. See the open definition as well as our principles for more information.

We discovered during the BibJSON spec development that it must be clear whether a specification is centrally controlled, or more of a communal agreement on use. There are advantages and disadvantages to each method, however they are not compatible – although one may become the other. We took the communal agreement approach, as we found that in the early stages there was more value in exposing the spec to people as widely and openly as possible than in maintaining close control. Moving to a close control format requires specific and ongoing commitment.

Community building remains tricky and somewhat serendipitous. Just as word-of-mouth can enhance reputation, failure of certain communities can detrimentally impact other parts of the project. Again, the best solution is to ensure everything is as open as possible from the outset, thereby reducing the impact of any one particular failure.

Opportunities and Possibilities

Over the two years, the concept of open bibliography has gone from requiring justification to being an expectation; the value of making this metadata openly available to the public is now obvious, and getting such access is no longer so difficult; where access is not yet available, many groups are now moving toward making it available. And of course, there are now plenty tools to make good use of available metadata.

Future opportunities now lie in the more general field of Open Scholarship, where a default of Open Bibliography can be leveraged to great effect. For example, recent Open Access mandates by many UK funding councils (eg Finch Report) could be backed up by investigative checks on the accessibility of research outputs, supporting provision of an open access corpus of scholarly material.

We intend now to continue work in this wider context, and we will soon publicise our more specific ideas; we would appreciate contact with other groups interested in working further in this area.

Further information

For the original project overview, see http://openbiblio.net/p/jiscopenbib2; also, a full chronological listing of all our project posts is available at http://openbiblio.net/tag/jiscopenbib2/. The work package descriptions are available at http://openbiblio.net/p/jiscopenbib2/work-packages/, and links to posts relevant to each work package over the course of the project follow:

  • WP1 Participation with Discovery programme
  • WP2 Collaborate with partners to develop social and technical interoperability
  • WP3 Open Bibliography advocacy
  • WP4 Community support
  • WP5 Data acquisition
  • WP6 Software development
  • WP7 Beta deployment
  • WP8 Disruptive innovation
  • WP9 Project management (NB all posts about the project are relevant to this WP)
  • WP10 Preparation for service delivery

All software developed during this project is available on open source licence. All the data that was released during this project fell under OKD compliant licenses such as PDDL or CC0, depending on that chosen by the publisher. The content of our site is licensed under a Creative Commons Attribution 3.0 License (all jurisdictions).

The project team would like to thank supporting staff at the Open Knowledge Foundation and Cambridge University Library, the OKF Open Bibliography working group and Open Access working group, Neil Wilson and the team at the British Library, and Andy McGregor and the rest of the team at JISC.

Posted in BibServer, JISC OpenBib | Tagged , , | 3 Comments

Importing Spanish National Library to BibServer

The Spanish National Library (Biblioteca Nacional de España or BNE) has released their library catalogue as Linked Open Data on the Datahub.

Initially this entry only containd the SPARQL endpoints and not downloads of the full datasets. After some enquiries from Naomi Lillie the entry was updated with links to the some more information and bulk downloads at: http://www.bne.es/es/Catalogos/DatosEnlazados/DescargaFicheros/

This library dataset is particularly interesting as it is not a ‘straightforward’ dump of bibliographic records. This is best explained by Karen Coyle in her blogpost.

For a BibServer import,  the implications are that we have to distinguish the types of record that is read by the importing script and take the relevant action before building the BibJSON entry. Fortunately the datadump was made as N-Triples already, so we did not have to pre-process the large datafile (4.9GB) in the same manner as we did with the German National Library dataset.

The Python script to perform the reading of the datafile can be viewed at https://gist.github.com/3225004

A complicating matter from a data wrangler’s point of view is that the field names are based on IFLA Standards, which are numeric codes and not ‘guessable’ English terms like DublinCore fields for example. This is more correct from an international and data quality point of view, but does make the initial mapping more time consuming.

 So when mapping a data item like https://gist.github.com/3225004#file_sample.nt we need to dereference each fieldname and map it to the relevant BibJSON entry.

As we identify more Linked Open Data National Bibliographies, these experiments will be continued under the http://nb.bibsoup.net/ BibServer instance.

Posted in BibServer, Data, JISC OpenBib, OKFN Openbiblio | Tagged , , | Leave a comment