Discovery silos vs. the open web

Bibliographic data that is not openly available on the web is harmful. In this post I’d like to point to a recent incident that demonstrates this: a correspondence between the board of the Orbis Cascade Alliance (“a consortium of 37 academic libraries in Oregon, Washington, and Idaho serving faculty and the equivalent of more than 258,000 full time students” (source)), Ex Libris, and EBSCO. The issue argued about is the provision of metadata describing content provided by EBSCO to Ex Libris’ discovery tool Primo. Thanks to the Orbis Cascade Alliance, the conversation is documented on the web. (I wish, more institutions would transparently document their negotiations with vendors as well as the resulting contracts…)

1. What is a discovery tool, anyway?

But first, for those who aren’t familiar with “next-generation discovery tools”, here is a short explanation of what these services are all about:

Such a discovery tool provides a single interface that enables discovery of (almost) any resource a library provides access to. These are resources from its physical and electronic collections as well as electronic resources it has licensed and, furthermore, resources from openly available collections. Discovery tools are based upon a unified customized index that comprises the library’s catalog data and metadata (+ sometimes full text) from publishers and bibliographic databases. In order to pre-index content metadata and/or the fulltext, providers of discovery tools enter into agreements with publishers and aggregators. Libraries spend quite some money on purchasing a discovery service. These services are very popular. As of today Marshall Breeding’s lib-web-cats directory (library web sites and catalogs) records in sum more than 1250 libraries using one of the four leading discovery systems: Serials Solutions’ Summon, EBSCO Discovery Service (EDS), Ex Libris’ Primo and OCLC’s WorldCat Local.

2. An overview over the “EBSCO and Ex Libris slapfight”

So, what has been going on between Orbis Cascade Alliance, EBSCO and Ex Libris? In short (thanks to the summaries provided in this thread entitled “EBSCO and Ex Libris slapfight”):

EBSCO is offering both content and a discovery tool EDS. Ex Libris would like to include at least metadata for this content in its Primo discovery layer, so that users at libraries who subscribe to the EBSCO products can find it using the library’s Primo instance. EBSCO won’t provide any data to Ex Libris, only access to the EDS API so that their content is best/only accessed via EBSCO’s own discovery tool EDS.

Here’s a more detailied overview over what happened. (You may skip this if the summary above is enough for you and continue at paragraph 3.)

May 2, 2013, Letter from Orbis Cascade Alliance to Ex Libris and EBSCO

Board of Orbis Cascade Alliance writes to Ex Libris and EBSCO expressesing disappointment over the companies’ “failure to make EBSCO academic library content seamlessly and fully available via Ex Libris discovery services”. The Orbis Cascade Alliance estimates their payments to both companies for the coming five years to 30 Million dollars and says that – if this issue is not resolved – it “will be required to reconsider the shape and scope of future business with EBSCO and Ex Libris”.

May 6, 2013, Ex Libris response to Alliance Board

Ex Libris agrees that this problem is unacceptable and blames EBSCO for not providing metadata to Ex Libris anymore. After EBSCO would agree in 2009 on providing Ex Libris with “comprehensive metadata, including subject headings, for several of the key EBSCO databases”, they changed their policy in 2010 when EBSCO launched its own discovery services EDS. From then on, according to Ex Libris, they “made EDS Discovery a requirement for users who wanted to continue this type of access. They decided to no longer enable their content for indexing in Primo and instead required that Primo users access the content only via an API.”

Ex Libris calls for an agreement with EBSCO “that would provide Primo customers with the content that EBSCO itself receives from external information providers – the content you and other libraries subscribe to, for which you should have access from your discovery platform ofchoice.” Ex Libris states that it has “in place many such agreements with other content providers”.

May 8, 2013, EBSCO response to Alliance Board

EBSCO mentions that – while there is no agreement with Ex Libris on providing data for Ex Libris’ discovery service – they have established such agreements with several other discovery service providers including OCLC and Serials Solutions. The existing agreements clarify the use of the EDS (EBSCO Discovery Service) API to make EBSCO content available via a discovery service. EBSCO’s view is that “an API solution is superior to a solution that relies strictly on metadata for several reasons, including the fact that we do not have the rights to provide (to Ex Libris or any third party) all of the content to that we feel is necessary for a quality user experience.”

In fact, libraries don’t have the right to make the content they already licensed discoverable via Primo. The reasons named by EBSCO are that (a) Primo’s relevancy ranking will “not take advantage of the value added elements of their products” and (b) users wouldn’t have an incentive to use the original databases as they think all the content is available via Primo. From this, the “user experience” would suffer. Giving optimization of the user experience as the reason, EBSCO tries to have greatest possible exclusive control over content and metadata provided by them.

May 9, 2013, Alliance Board response to EBSCO and Ex Libris

The Orbis Cascade Alliance responds to the companies’ letters:

“While these letters illustrate the nature of this continuing impasse, they do nothing to address a remarkable and unacceptable disservice to your customers. (…) Ultimately we face a business decision. The Orbis Cascade Alliance is now actively investigating options and will make decisions that may move us away from your products in order to better serve our faculty, students, and researchers. Again, we urge EBSCO and Ex Libris to quickly resolve this issue.”

May 14, 2013, Ex Libris Open Letter to the Library Community

But obviously, EBSCO and Ex Libris are far away from “resolving this issue”. In the next step, Ex Libris responds to EBSCO’s response with a “point-by-point analysis” to refute the claims made by EBSCO.

I had some problems with the terminology, so here are a few words about language usage: Ex Libris differs between “index-based search”, i.e. search over one central index just like a “true next generation discovery service” does it, and “API-based search”. This is a bit confusing as APIs are often based on an index so that – strictly speaking – “API-based search” and “index-based search” don’t necessarily exclude each other. But it makes a difference if a service like Primo, that is based on one index, has to use external APIs so that the service actually becomes a “metasearch tool” instead of a “true next generation discovery service”. (Talking about terminology – as I have the feeling not all people are using it in the same way as I do: I differ between content and (meta)data. In short and in this context, content is the thing scholars produce and read while metadata is the data that describes the content.)

However, in short, Ex Libris says EBSCO would choose “not to share the content that they do have the rights to share” although this is the content that the respective library already has payed for. Ex Libris accordingly accuses EBSCO of wishing “to control the ranking of its content, which is possible through the API to EDS they require”.

Interestingly, after Ex Libris state that EBSCO wants to control the ranking of its content it reads:

“EBSCO clearly believes that end-users … prefer to search database silos. This runs counter to what both end-users and libraries wish to achieve with a library-based discovery service.”

As discovery services are silos themselves – only bigger than the old silos – this is actually also an argument against Primo, EDS, Summon etc.

In the end of this letter Ex Libris proclaims itself as the libraries’ comrade-in-arms who is fighting for their interests:

“We stand with you and continue to believe that together we can bring change such that EBSCO databases, whose rights EBSCO determines, are available to any EBSCO customer through the discovery service of choice.”

Indeed, there are overlapping interests of libraries and Ex Libris. But obviously Ex Libris is primarily following its own interest trying to get its own discovery service populated with the relevant data. Libraries should demand more than having the ability to chose one of several commercial products. Rather, anybody interested and equipped with the necessary amount of resources and technical capabilities should be able to get the metadata for academic publications that are accessible (with or without a toll) on the web in order to build their own discovery indexes.

There are already a lot of open bibliographic data sets out there. But mostly, this data comes from libraries and related institutions, you won’t find much data from the publishers’ side. What we need is more and more publishers publishing their metadata, citation data etc. openly on the web, at best in the way Nature Publishing Group is doing it.

3. Rejecting silos, encouraging open bibliographic data

The conflict between EBSCO and Ex Libris is just another indicator of how important it is to move away from closed content and discovery silos to web-integrated, openly available bibliographic data. At least it would facilitate to get hold of the metadata for content provided via the web. Although bibliographic data for the majority of resources published in the past as print-only wouldn’t be covered, this constitutes a future-proof way of providing bibliographic data on the web.

Rurik Greenall, (who wrote about NTNU’s LOD activities on openbiblio.net some time ago and from whom I took the “future-proof” terminology) recently summed it up nicely in his alternative “for the hard of understanding” version of his talk “Making future-proof library content for the Web” at this year’s ELAG conference. Libraries and publishers (and anybody else using the web as publication platform) should acknowledge how the web works and make their content and data available using persistent HTTP-URIs as identifiers and serving content and metadata using standards like HTML, PDF/A, TIFF+XMP, JPEG+XMP, JSON-LD, RDFa.

In regard to discovery tools, Rurik provides the following conclusion of his talk packed as a rhetorical question:

“If you pay money to a content provider that is also a metadata provider and then buy a search index from them what motivation do they have to present their content in a findable way on the web?”

Accordingly, Rurik ends his talk with stating the need that librarians should ask themselves:

  • Do we deliver content to the web in the described way? (i. e. using persistent HTTP-URIs as identifiers and open standards like HTML, PDF/A, TIFF+XMP. JSON-LD etc.)
  • Do we subscribe to a service that does the exact opposite?

Unfortunately, many librarians – especially on the management level – are not aware of the importance of applying web standards and publishing open data. A lot of persuasion has to be done until this thinking becomes part of a broader mindset and non-open forms of publishing metadata and providing discovery tools won’t pay off anymore.

4. Guiding the way?

Carl Grant last week also published a blog post on the topic worth reading. I agree with him when he says “we need to define the guidelines under which we’ll buy products and services dealing with content, content enhancements, and discovery services.” The International Group of Ex Libris Users (Igelu) yesterday did a first step to get to such guidelines proposing a clause libraries should add to their contracts with content providers. Unfortunately, this proposal doesn’t go very far as it would only enable the indexing of “citation metadata (including without limitations subject headings and keywords), abstract and full-text, all as available” by “Discovery Service Providers”. Nowhere it is made clear who falls under this concept of “Discovery Service Provider”. For example, it isn’t clear at all if a library consortium wanting to index rich metadata that its members have subscribed to also is regarded a “Discovery Service Provider”.

If you advocate open bibliographic data you should object the notion that bibliographic data be made available only to the exclusive club of “Discovery Service Providers”. Instead, anybody interested in providing a service, running some analytics or doing whatever else with that data should be able to collect it. It’s up to the advocates of open bibliographic data to participate in the development of guidelines for licensing content and discovery services.

This entry was posted in vendors. Bookmark the permalink.

5 Responses to Discovery silos vs. the open web

  1. FraEnrico says:

    Excellent overview of the problem, thanks for this. But It seems to me that the problem is always the same: the availability of publisher’s metadata. Publisher hold a great power by owning their metadata, and they will never let them available in the open. Open bibliographic data provided by libraries will never be enough, because the main contents (i.e. articles) lie in someone else’s (greedy) hands.

  2. Profile photo of Adrian Pohl Adrian Pohl says:

    FraEnrico wrote:

    “Publisher hold a great power by owning their metadata, and they will never let them available in the open. Open bibliographic data provided by libraries will never be enough, because the main contents (i.e. articles) lie in someone else’s (greedy) hands.”

    Exactly. That is why I argue that libraries should make it mandatory for content providers to deliver rich metadata along with the licensed content. If libraries demanded this rigorously and didn’t sign contracts missing such a clause, article-level metadata would finally become openly available and re-usable.

  3. Jörg Prante (@xbib) says:

    Yes, for a long time, it’s disappointing to buy metadata from metadata farms, only to find out how different the metadata is from the holdings in library catalogs and how hard the gaps can be filled.

    The answer is that libraries must be enabled to merge their catalogs with article reference databases without restriction.

  4. Lukas Koster says:

    One comment from the IGeLU point of view. IGeLU represents a specific group of users of bibliographic metadata, the customers of Ex Libris, the provider of the Primo Central global metadata index and the Primo Discovery Tool. In this respect their first objective is to protect the position of the institutions that pay both for the content provided by commercial publishers and other content providers on the one hand, and for discovery systems provided by commercial Discovery Service Providers on the other hand. This is why they focus on these parties. In these specific circumstances these customers do not per se benefit from Open Metadata, because they already pay for the closed metadata. Their problem right now is that some commercial content providers deny them the right to access that metadata by any means they choose.
    This is not to say that IGeLU and the institutions they represent are opposed to Open Data at all. On the contrary. But these are two different paths and battles.

  5. Pingback: Discovery vendors and pre-indexed data – what can be done? | Eds' blog (now better encoded)

Leave a Reply to Jörg Prante (@xbib) Cancel reply

Your email address will not be published. Required fields are marked *