Open Bibliography and Open Bibliographic Data » vendors

Discovery silos vs. the open web

Adrian Pohl — Sun, 23 Jun 2013 12:17:35 +0000

Bibliographic data that is not openly available on the web is harmful. In this post I’d like to point to a recent incident that demonstrates this: a correspondence between the board of the Orbis Cascade Alliance (“a consortium of 37 academic libraries in Oregon, Washington, and Idaho serving faculty and the equivalent of more than 258,000 full time students” (source)), Ex Libris, and EBSCO. The issue argued about is the provision of metadata describing content provided by EBSCO to Ex Libris’ discovery tool Primo. Thanks to the Orbis Cascade Alliance, the conversation is documented on the web. (I wish, more institutions would transparently document their negotiations with vendors as well as the resulting contracts…)

1. What is a discovery tool, anyway?

But first, for those who aren’t familiar with “next-generation discovery tools”, here is a short explanation of what these services are all about:

Such a discovery tool provides a single interface that enables discovery of (almost) any resource a library provides access to. These are resources from its physical and electronic collections as well as electronic resources it has licensed and, furthermore, resources from openly available collections. Discovery tools are based upon a unified customized index that comprises the library’s catalog data and metadata (+ sometimes full text) from publishers and bibliographic databases. In order to pre-index content metadata and/or the fulltext, providers of discovery tools enter into agreements with publishers and aggregators. Libraries spend quite some money on purchasing a discovery service. These services are very popular. As of today Marshall Breeding’s lib-web-cats directory (library web sites and catalogs) records in sum more than 1250 libraries using one of the four leading discovery systems: Serials Solutions’ Summon, EBSCO Discovery Service (EDS), Ex Libris’ Primo and OCLC’s WorldCat Local.

2. An overview over the “EBSCO and Ex Libris slapfight”

So, what has been going on between Orbis Cascade Alliance, EBSCO and Ex Libris? In short (thanks to the summaries provided in this thread entitled “EBSCO and Ex Libris slapfight”):

EBSCO is offering both content and a discovery tool EDS. Ex Libris would like to include at least metadata for this content in its Primo discovery layer, so that users at libraries who subscribe to the EBSCO products can find it using the library’s Primo instance. EBSCO won’t provide any data to Ex Libris, only access to the EDS API so that their content is best/only accessed via EBSCO’s own discovery tool EDS.

Here’s a more detailied overview over what happened. (You may skip this if the summary above is enough for you and continue at paragraph 3.)

May 2, 2013, Letter from Orbis Cascade Alliance to Ex Libris and EBSCO

Board of Orbis Cascade Alliance writes to Ex Libris and EBSCO expressesing disappointment over the companies’ “failure to make EBSCO academic library content seamlessly and fully available via Ex Libris discovery services”. The Orbis Cascade Alliance estimates their payments to both companies for the coming five years to 30 Million dollars and says that – if this issue is not resolved – it “will be required to reconsider the shape and scope of future business with EBSCO and Ex Libris”.

May 6, 2013, Ex Libris response to Alliance Board

Ex Libris agrees that this problem is unacceptable and blames EBSCO for not providing metadata to Ex Libris anymore. After EBSCO would agree in 2009 on providing Ex Libris with “comprehensive metadata, including subject headings, for several of the key EBSCO databases”, they changed their policy in 2010 when EBSCO launched its own discovery services EDS. From then on, according to Ex Libris, they “made EDS Discovery a requirement for users who wanted to continue this type of access. They decided to no longer enable their content for indexing in Primo and instead required that Primo users access the content only via an API.”

Ex Libris calls for an agreement with EBSCO “that would provide Primo customers with the content that EBSCO itself receives from external information providers – the content you and other libraries subscribe to, for which you should have access from your discovery platform ofchoice.” Ex Libris states that it has “in place many such agreements with other content providers”.

May 8, 2013, EBSCO response to Alliance Board

EBSCO mentions that – while there is no agreement with Ex Libris on providing data for Ex Libris’ discovery service – they have established such agreements with several other discovery service providers including OCLC and Serials Solutions. The existing agreements clarify the use of the EDS (EBSCO Discovery Service) API to make EBSCO content available via a discovery service. EBSCO’s view is that “an API solution is superior to a solution that relies strictly on metadata for several reasons, including the fact that we do not have the rights to provide (to Ex Libris or any third party) all of the content to that we feel is necessary for a quality user experience.”

In fact, libraries don’t have the right to make the content they already licensed discoverable via Primo. The reasons named by EBSCO are that (a) Primo’s relevancy ranking will “not take advantage of the value added elements of their products” and (b) users wouldn’t have an incentive to use the original databases as they think all the content is available via Primo. From this, the “user experience” would suffer. Giving optimization of the user experience as the reason, EBSCO tries to have greatest possible exclusive control over content and metadata provided by them.

May 9, 2013, Alliance Board response to EBSCO and Ex Libris

The Orbis Cascade Alliance responds to the companies’ letters:

“While these letters illustrate the nature of this continuing impasse, they do nothing to address a remarkable and unacceptable disservice to your customers. (…) Ultimately we face a business decision. The Orbis Cascade Alliance is now actively investigating options and will make decisions that may move us away from your products in order to better serve our faculty, students, and researchers. Again, we urge EBSCO and Ex Libris to quickly resolve this issue.”

May 14, 2013, Ex Libris Open Letter to the Library Community

But obviously, EBSCO and Ex Libris are far away from “resolving this issue”. In the next step, Ex Libris responds to EBSCO’s response with a “point-by-point analysis” to refute the claims made by EBSCO.

I had some problems with the terminology, so here are a few words about language usage: Ex Libris differs between “index-based search”, i.e. search over one central index just like a “true next generation discovery service” does it, and “API-based search”. This is a bit confusing as APIs are often based on an index so that – strictly speaking – “API-based search” and “index-based search” don’t necessarily exclude each other. But it makes a difference if a service like Primo, that is based on one index, has to use external APIs so that the service actually becomes a “metasearch tool” instead of a “true next generation discovery service”. (Talking about terminology – as I have the feeling not all people are using it in the same way as I do: I differ between content and (meta)data. In short and in this context, content is the thing scholars produce and read while metadata is the data that describes the content.)

However, in short, Ex Libris says EBSCO would choose “not to share the content that they do have the rights to share” although this is the content that the respective library already has payed for. Ex Libris accordingly accuses EBSCO of wishing “to control the ranking of its content, which is possible through the API to EDS they require”.

Interestingly, after Ex Libris state that EBSCO wants to control the ranking of its content it reads:

“EBSCO clearly believes that end-users … prefer to search database silos. This runs counter to what both end-users and libraries wish to achieve with a library-based discovery service.”

As discovery services are silos themselves – only bigger than the old silos – this is actually also an argument against Primo, EDS, Summon etc.

In the end of this letter Ex Libris proclaims itself as the libraries’ comrade-in-arms who is fighting for their interests:

“We stand with you and continue to believe that together we can bring change such that EBSCO databases, whose rights EBSCO determines, are available to any EBSCO customer through the discovery service of choice.”

Indeed, there are overlapping interests of libraries and Ex Libris. But obviously Ex Libris is primarily following its own interest trying to get its own discovery service populated with the relevant data. Libraries should demand more than having the ability to chose one of several commercial products. Rather, anybody interested and equipped with the necessary amount of resources and technical capabilities should be able to get the metadata for academic publications that are accessible (with or without a toll) on the web in order to build their own discovery indexes.

There are already a lot of open bibliographic data sets out there. But mostly, this data comes from libraries and related institutions, you won’t find much data from the publishers’ side. What we need is more and more publishers publishing their metadata, citation data etc. openly on the web, at best in the way Nature Publishing Group is doing it.

3. Rejecting silos, encouraging open bibliographic data

The conflict between EBSCO and Ex Libris is just another indicator of how important it is to move away from closed content and discovery silos to web-integrated, openly available bibliographic data. At least it would facilitate to get hold of the metadata for content provided via the web. Although bibliographic data for the majority of resources published in the past as print-only wouldn’t be covered, this constitutes a future-proof way of providing bibliographic data on the web.

Rurik Greenall, (who wrote about NTNU’s LOD activities on openbiblio.net some time ago and from whom I took the “future-proof” terminology) recently summed it up nicely in his alternative “for the hard of understanding” version of his talk “Making future-proof library content for the Web” at this year’s ELAG conference. Libraries and publishers (and anybody else using the web as publication platform) should acknowledge how the web works and make their content and data available using persistent HTTP-URIs as identifiers and serving content and metadata using standards like HTML, PDF/A, TIFF+XMP, JPEG+XMP, JSON-LD, RDFa.

In regard to discovery tools, Rurik provides the following conclusion of his talk packed as a rhetorical question:

“If you pay money to a content provider that is also a metadata provider and then buy a search index from them what motivation do they have to present their content in a findable way on the web?”

Accordingly, Rurik ends his talk with stating the need that librarians should ask themselves:

Do we deliver content to the web in the described way? (i. e. using persistent HTTP-URIs as identifiers and open standards like HTML, PDF/A, TIFF+XMP. JSON-LD etc.)

Do we subscribe to a service that does the exact opposite?

Unfortunately, many librarians – especially on the management level – are not aware of the importance of applying web standards and publishing open data. A lot of persuasion has to be done until this thinking becomes part of a broader mindset and non-open forms of publishing metadata and providing discovery tools won’t pay off anymore.

4. Guiding the way?

Carl Grant last week also published a blog post on the topic worth reading. I agree with him when he says “we need to define the guidelines under which we’ll buy products and services dealing with content, content enhancements, and discovery services.” The International Group of Ex Libris Users (Igelu) yesterday did a first step to get to such guidelines proposing a clause libraries should add to their contracts with content providers. Unfortunately, this proposal doesn’t go very far as it would only enable the indexing of “citation metadata (including without limitations subject headings and keywords), abstract and full-text, all as available” by “Discovery Service Providers”. Nowhere it is made clear who falls under this concept of “Discovery Service Provider”. For example, it isn’t clear at all if a library consortium wanting to index rich metadata that its members have subscribed to also is regarded a “Discovery Service Provider”.

If you advocate open bibliographic data you should object the notion that bibliographic data be made available only to the exclusive club of “Discovery Service Providers”. Instead, anybody interested in providing a service, running some analytics or doing whatever else with that data should be able to collect it. It’s up to the advocates of open bibliographic data to participate in the development of guidelines for licensing content and discovery services.

Sweden Ends Negotiations with OCLC

Adrian Pohl — Thu, 12 Jan 2012 13:22:05 +0000

The following guest post is by Maria Kadesjö who works at the Libris-department at the National Library of Sweden.

The national library of Sweden has ended negotiations with OCLC on participation in WorldCat, as the parties could not come to an agreement. The negotiations go back to 2006 and the key obstacle for the national library has been the record use policy. Some time into the negotiations OCLC presented certain conditions for how records taken from WorldCat for cataloguing were to be used in Libris.

A No to WorldCat Rights and Responsibilites

The question has its base in the “WorldCat Rights and Responsibilities for the OCLC Cooperative” where you as an OCLC member have to accept certain conditions and the aim is to support the ongoing and long-term viability and utility of WorldCat (and its services). The National Library cannot accept the conditions as they are today since WorldCat is not the only arena in which The National Library wants and needs to be active. Accepting the conditions would mean that we would forever have to relate to OCLC’s policy.

Libris an open database

Libris is the Swedish union catalogue with (with some 170 libraries, primarily academic libraries) and is built quite similar to WorldCat but on a smaller scale. Member libraries catalogue their records in Libris and the records are then exported to their local library systems. The Libris-cooperative is built on voluntarily participation. Any Libris-library should be able to, whenever they want, take out all their bibliographic records from Libris and use them in another library system. The National library makes no claim to the records and do not control how the libraries chose to use their bibliographic records.

The National Library has taken the decision to release the national bibliography and authority data as open data. The reason for this is to acknowledge the importance of open data and the importance of libraries’ control of their data when it comes to the long term sustainability and competition of the services needed by libraries and their users.

In the Agreement on Participation in the Libris joint catalogue signed by the National Library and cataloguing libraries, Point 3.3 specifies that “the content in Libris is owned by the National Library and is freely accessible in accordance with the precepts and methods reported by the National Library, both for Participating Libraries and for external partners.” This paragraph is needed so that the National library can sign agreements for Libris with other partners (OCLC for example) but also so that the National Library can abstain from claims of ownership of bibliographic records taken from Libris. Libris can therefore be an open database, both for the libraries that use Libris for cataloguing and for others.

Not consistent with Libris principles

The consequences of signing with OCLC would be that the National Library would have to supervise how records originating in WorldCat were used. And a library that took a bibliographic record from WorldCat for cataloguing in Libris and exported it to its own system would have to accept OCLC´s term of use.

A library that wished to leave Libris would not obviously be able to do this, since it is not self-evident that the bibliographic records could be integrated into other systems. This would be an infringement in the voluntarily participation that characterizes Libris. In practice, the National Library has no mandate that restricts the freedom of action of Libris’ libraries in this way, since the National Library has no possibility of influencing how the Libris libraries themselves choose to use their catalogues.

Ex Libris, Alma and Open Data

Adrian Pohl — Thu, 11 Aug 2011 13:29:54 +0000

This blog post is written by Carl Grant, chief librarian at Ex Libris and past president of Ex Libris North America, in answer to some questions that Adrian Pohl, coordinator of the OKFN Working Group on Open bibliographic Data, posed in the beginning of July in response to Ex Libris’ announcement of an “Expert Advisory Group for Open Data”. It is cross-posted on the OKFN blog and openbiblio.net.

The Ex Libris announcement in June 2011 that we were forming an “Expert Advisory Group for Open Data” has generated much discussion and an equal number of questions. Many of the questions bring to light the ever-present tensions and dynamics that exist between the various sectors and advocates of open data and systems. It also raises ongoing questions about how the goals of openness can be reasonably and properly achieved and in what timeframe? Particularly when it involves companies, products and data structures that have roots in proprietary environments.

For those who are not part of the Ex Libris community, allow me to define some of the Ex Libris terminology involved in this discussion:

Alma. The Ex Libris next generation, cloud-based, library management service package that supports the entire suite of library operations—selection, acquisition, metadata management, digitization, and fulfillment—for the full spectrum of library materials.
Community Zone: A major component of Alma that includes the Community Catalog (bibliographic records) and the Central Knowledgebase and Global Authorities Catalog. This zone is where customers may contribute content to the Community Catalog and in so doing, agree to allow users to share, edit, copy and redistribute the contributed records that are linked to their inventory.
Community Catalog: The bibliographic record portion of the Community Zone.
Community Zone Advisory Group: A group of independent experts and Alma early adopters advising Ex Libris on policies regarding the Community Catalog.

Taking into consideration the many emails and conversations we’ve had around the topic, this original set of questions seems to have shared interest:

These are good questions so let’s work our way through this list.

Q: What is this working group actually about? About open licensing, open formats, open vocabularies, open source software or all of them?

A: The Community Zone Advisory Group is tasked with creating high-level guidelines to govern the contribution, maintenance, use and extraction of bibliographic metadata placed in the Community Catalog of Alma. As such, this group is also advising us on suggested licenses to use. Given that each Alma customer will have a local, private catalog that need not be shared, we’ve taken the position that we want to promote the most open policies possible on data that libraries contribute to the Community Catalog. Much of the discussion centers around what approach would be best for libraries and lead to the clearest terms of use for Alma users.

We’ve said to the group, that we’ll leave it to them to determine if the group will need to exist beyond the time this original task charge is completed. We fully expect that we will have similar groups, if not this same one, advise us on other data that we plan to place in the Community Zone in the future.

Q: Does the working group only cover openness inside Ex Libris’ future metadata management system, e.g. openness between paying members in a closed ecosystem, or will it address openness of the service within a web-wide context?

A: A more complicated question because it is really two interlaced questions. First it is asking if the data is open? Second, it is asking if the system is open?

We are most assuredly making the bibliographic metadata open and the answer to the next question provides more detail on how we’re approaching this.

As for the systems holding the data, we are planning on being open, but this is a place where we clearly must wait until this new system is up and running well for our paying customers before we open the Community Catalog up to others. Even then, we’ll want to closely monitor the impact on bandwidth, compute cycles and data to determine any costs associated with this use and how to best proceed and participate in making the Community Catalog open to larger communities on a web-wide context.

The goal is to work with our customers and the community to set achievable goals that move us down that path while factoring practicality into ideology. However, in the first deployment, our primary constituents will be institutions who have adopted Alma.

Q: Will Ex Libris push towards real open data for the Alma community zone? That would mean 1.) Using open (web) standards like HTTP, HTML and RDF as data model etc. 2.) Conforming to the Principles on Open Bibliographic Data by open licensing, 3.) providing APIs and the possibility to download data dumps.

A: Specifically:

Open standards and data formats are core to our design of Alma. They serve as our mechanisms of data exchange and the basis of our data structures when appropriate. Not all standards will be the basis of our internal functionality, however. For example, we’re building the infrastructure for RDF as a data exchange mechanism, but it does not fundamentally underpin our data structure, just as MARC21 binary format is not the root of our bibliographic record structure. When appropriate and possible, we are implementing these standards for data exchange based on libraries’ needs.
We are currently examining the open licenses to determine if we can utilize them given the other data we’d planned for the Community Zone. Currently, our Alma agreements include language that largely replicates the key terms of the Creative Commons PDDL license for customer-contributed records in the Community Catalog.
We will be providing API’s and plan to support downloads, but again, as we move forward, we plan to do this in a phased approach so that we can monitor infrastructure and human resource demands associated with the adoption. As noted above, our first priority will be providing service to our existing Alma users. This means that in the first release, providing Alma institutions with full access to their own data (in the form of API’s and data dumps) is where we’re focusing our attention.

In the final analysis, we feel our approach is really quite supportive of librarianship and quite open. We’re balancing the competing needs of our stakeholders, but it is important to note that Ex Libris is not artificially implementing restrictions that limit open data. Once libraries join the Alma community, there are no limits on their ability to manage their own projects or collaboration. Where we have the resources, we’re helping promote this open approach. We’ll be allocating our resources to provide best-in-class service to our customers and, at the same time, in a closely monitored and managed approach, continuing to expand access to larger communities.

We think all of this combined is a powerful statement about how proprietary and open approaches can beneficially co-exist and thus will help to move libraries forward in substantive ways.