The following guest post is by Rurik Thomas Greenall who works at the section for information resources at NTNU University Library, Trondheim in the area of linked data and resource access. His interests include programming, particularly for the semantic web, and knowledge representation; he speaks and writes on these topics.
NTNU University Library has been a publisher of open data – data released under a liberal licence – since 2009, when we first started working with RDF and linked data. Our aims were initially achieved in project-based form, but linked open data has since been formalized within the structure of the library organization, which makes NTNU University Library one of the few libraries to have positions dedicated to the production of linked data.
The move away from the traditional
The initial ideas we had about linked data stem from a project, UBiT2010, which dealt with modernizing library services. The working group consisted of a cross-section of the library staff who were interested in technology. Some of the major findings in the project were:
- re-mix, re-use, re-invent, re-package, re-work
- users know their own best
- trust and openness
- library IT is nothing, there is only IT
These ideas do not seem particularly inspiring any more, however, they were quite cutting edge in 2007–2008, and lead us down a path toward distributed computing and distributed interfaces, as well as a move away from monolithic systems that provided so-called expert interfaces that had frightened away our users in the first place.
At NTNU University Library, the data group was somewhat radicalized toward the openness movement; this was largely the case because of a protectionist agenda we perceived within our commercial partners. There seemed to us to be an intrinsic concept of keeping data closed in the business models of many of the companies we worked with, a fact that we also saw worked against them because those suppliers that were somewhat more open with their data provided better service and showed higher use than those that were less open. Further, openness seems to us to be in the spirit of the web, and its best feature; libraries have also traditionally been important players in regard to openness, so we happily continue this tradition.
Thus, we as a state-funded institution within a social democracy were duty bound to adopt a line that would allow the principles of openness to be applied to their fullest extent. After consulting with among others Talis, we found out that the only way forward was public domain and zero licences. Thus, we have adopted a policy of using the ODC-PDDL on all of our data (we reserve some rights related to non-data content typically providing content documents – i.e. documents that are the traditional literature holdings of the library, where we own the copyright/the copyright has come into the public domain – under CC-BY-NC-licences). Note that the applicability of licensing such as the Norwegian licence for open data (Norsk lisens for offentlige data) in international arenas where we operate is uncertain, and it is certainly the case that we resist the conservative impulses that require attribution for data; in fact, we actively encourage others to adopt ODC-PDDL in combination with the use of HTTP URIs as the URI acts as a provenance and token of trust, something intrinsically more important than rights attribution when it comes to open data.
Another outcome from this first project was that we needed a lot more access to our data than we had had previously. This fact lead us on a journey of discovery related to APIs (such as Ex Libris’ X-Server and Z39.50 and SRU), none of which really provided us with what we needed, which was real-time access to our data in a way that made sense in a web-services context.
The first linked open data baby steps
Moving away from the traditional library IT scene, in June 2009, we were invited to a seminar on linked data hosted by the National Library of Sweden, the Norwegian Library Laboratory (Bibliotekslaboratoriet) and Talis. This meeting was critical, because it brought together two core ideas: the use of standard technologies to deliver library data and the idea of openness.
Once the pieces had fallen into place, we started working toward our goals of producing linked open data from our data sets. The typical barriers were in place: getting hold of the data; who actually owns the data and getting someone sign the papers that will allow us to release the data. In the end, these pieces fell into place because the library management were brave enough to just do the job, signing the licences and letting us take the step of publishing data openly.
Our first project was TEKORD, a controlled vocabulary that provides Norwegian terms for technical subjects and Universal Decimal Classification (UDC) numbers. The data set is open RDF, but it is not currently linked to anything because of the lack of data sets containing UDC numbers. The data set was, however, immediately snapped up by the UDC Consortium and used as the basis for the Norwegian representation of the top-level terms for UDC, which was rather encouraging.
Rådata nå!
In 2010, NTNU University Library and BIBSYS (the state-owned, Norwegian national library management system for academic and research libraries) received funding for a project called Rådata nå! (Norwegian, raw data now) which published a data set containing 1.5 million person name authorities as linked open data, aligning this data with among others DBpedia and VIAF. The project produced 10 million triples and was the first large data set to be released as linked open data in Norway (under the ODC-PDDL). The reason that name authorities were chosen was that they can be seen as fixed points of reference on the web, and can be used to identify individuals on the web of data who are rarely represented otherwise (the data set includes many master’s degree theses, and so includes the names of many prominent and not so prominent people).
While this data set was a huge milestone for us, it was far from the only data set we have released; much of our strategy has been to create sustainable systems that we use ourselves, creating an available source for our business data at the same time as enriching it with other data; in this way we are both open and strategically aware in equal measure.
(Actually) using linked open data
The choice of linked open data brings together a means of representation and enrichment that was not possible using other technologies, and while the learning curve for us has not been easy, it has been rewarded in many respects. Our use of open data has created the opportunity to create the kinds of system that we feel our users will be inspired by, that answer the questions users have and help them find more than they were looking for. This extends beyond the concept of monolithic systems where our aim was to get users into our systems; in fact it is entirely about getting the data out there, ensuring that it is available without reference to a given web page. Openness is the key, and this way of thinking has improved our work immensely.
By using other people’s data, you gain insight into how to integrate our domain (that of traditional library data; bibliographic data, controlled vocabularies, etc.) with those domains that we describe and serve. Typical examples of this include using geographical terms from geographical resources in addition to the gazetteers that were typically used in bibliographic description and linking to resources of scientific information. This adds value in the form of alignment with resources that provide additional information that can be used when building services on top of the data.
Another case for our use of linked open data is the multilingual situation in academia in Norway, where Norwegian and English exist side by side in a way that is common in other cultures. Additionally, Norwegian is joined by other languages from Norway (Sámi and Kvensk) as well as the written standard for Norwegian being represented in two varieties (Bokmål and Nynorsk). This relatively complex situation is solved easily by co-ordination with data sets that represent the same information in the different languages.
Since these early efforts, we have moved ever more towards workflows centred around linked data; providing representations of and access to scholarly journals, research databases, researcher information, academic conference authorities as well as controlled vocabularies such as Norwegian MeSH and Norwegian scientific disciplines. These all feature in our approach to using and releasing our data, and we happily share and align our work with what other people are doing.
Today’s situation
The current projects that are being undertaken are presentation of the library’s special collections and statistical modelling of usage. The library’s special collections (manuscripts and rare books) are catalogued and presented in a workflow that is based purely on linked open data (because the documents are catalogued directly as RDF and the systems use linked open data as their only data resources). This project integrates all of the aspects of current semantic web, including reasoning and in-HTML standards like RDFa. The benefits of this approach have been enormous, allowing agile, rapid development of interface and data to meet changing needs, as well as a high level of visibility in the search engines we have targeted. We are currently running workshops to see how we can work together with other holders of unique historical materials to see how we can achieve a co-ordinated web of data for these materials.
It is evident for us that using linked open data not only provides the access we need to our business data, it is also a way of enriching it with the data provided by others. It is also clear that in an age where “interconnected” is the norm, any other approach not only limits your success, it probably excludes it.
About NTNU University Library
NTNU University Library is the library of the Norwegian University of Science and Technology (Norges Teknisk-Naturvitenskapelige universitet – NTNU), based in Trondheim. It provides services to the university for subjects including the sciences and technology as well as humanities and social sciences. As a centre for development within technical library and information science, the staff at the library participate in both national and international collaborative projects and help set the scene within current trends for their fields of specialization. The library is host to the biannual international conference emtacl: emerging technologies in academic libraries, a showcase for leading trends within technology within and for academic libraries.
Pingback: NTNU University Library – a Linked Open Data Hub | Open … | Today Headlines
Great to see more of these testimonials by libraries, though wondering what vocabs are core to the org, ho and why these vocabs were chosen and most significantly what new vocabs are you looking at and why? #jiscExpo #jiscopenb
Up to now core vocabs have been rdfs, dcterms, bibo, foaf, skos and owl. We try to keep things relatively simple, using common vocabs and adding few properties and classes of our own. We have largely avoided relying on the explicitly library domain vocabularies, however we do some modelling using properties and classes from these (for the sake of those who are interested). If you’re interested in reasons why we avoid the library domain stuff, it’s simple: the models are entrenched in (often record-based) approaches that aren’t particularly interesting for us. It isn’t a problem for us to pick and choose properties from these vocabularies, but we have avoided wholehearted adoption of, for example, FRBR because — when we tried it — it gave us no real benefits or things that we couldn’t achieve in a simpler (non-domain-specific/more trivial) way.
When selecting vocabularies, we try to follow what seems to be best practice. For example, we have recently been working with the participation ontology and roles ontologies for this. We do a lot of domain-specific modelling (examples: people, theatre productions, roles), and we don’t do “simple” MARC-to-RDF conversions on the whole, we spend more time breaking up records and remodelling these into (re-)usable chunks.
On the whole, our approach to modelling means that we provide for our own business needs, and then add in extra stuff that makes the data usable for others. This entails a lot of redundancy, but I think it means we deliver a better data product.
Pingback: Discovery silos vs. the open web - archaeoinaction.info