The following guest post is from the National Library of Spain and the Ontology Engineering Group (Technical University of Madrid (UPM)).
This initiative is part of the project “Linked Data at the BNE”, supported by the BNE in cooperation with the Ontology Engineering Group (OEG) at the Universidad Politécnica de Madrid (UPM). The first meeting took place in September 2010, whereas the collaboration agreement was signed in October 2010. The first set of data was transformed and linked in April 2011, but a more significant set of data was done in December 2011.
The initiative was presented in the auditorium of the BNE on 14th December 2011 by Asunción Gómez-Pérez, Professor at the UPM and Daniel Vila-Suero, Project Lead (OEG-UPM), and by Ricardo Santos, Chief of Authorities, and Ana Manchado Mangas, Chief of Bibliographic Projects, both from the BNE. The attendant audience enjoyed the invaluable participation of Gordon Dunsire, Chair of the IFLA Namespace Group.
The concept of Linked Data was first introduced by Tim Berners-Lee in the context of the Semantic Web. It refers to the method of publishing and linking structured data on the Web. Hence, the project “Linked Data at the BNE” involves the transformation of BNE bibliographic and authority catalogues into RDF as well as their publication and linkage by means of IFLA-backed ontologies and vocabularies, with the aim of making data available in the so-called cloud of “Linked Open Data”. This project focuses on connecting the published data to other data sets in the cloud, such as VIAF (Virtual International Authority File) or DBpedia. With this initiative, the BNE takes the challenge of publishing bibliographic and authority data in RDF, following the Linked Data Principles and under the CC0 (Creative Commons Public Domain Dedication) open license. Thereby, Spain joins the initiatives that national libraries from countries such as the United Kingdom and Germany have recently launched.
Vocabularies and models
IFLA-backed ontologies and models, widely agreed upon by the library community, have been used to represent the resources in RDF. Datos.bne.es is one of the first international initiatives to thoroughly embrace the models developed by IFLA, such as the FR models FRBR (Functional Requirements for Bibliographic Records), FRAD (Functional Requirements for Authority Data), FRSAD (Functional Requirements for Subject Authority Data), and ISBD (International Standard for Bibliographic Description).
FRBR has been used as a reference model and as a data model because it provides a comprehensive and organized description of the bibliographic universe, allowing the gathering of useful data and navigation. Entities, relationships and properties have been written in RDF using the RDF vocabularies taken from IFLA; thus FR ontologies have been used to describe Persons, Corporate Bodies, Works and Expressions, and ISBD properties for Manifestations. All these vocabularies are now available at Open Metadata Registry (OMR), with the status of published. Additionally, in cooperation with IFLA, labels have been translated to Spanish. MARC21 bibliographic and authority files have been tested and mapped to the classes and properties at OMR. The following mappings were carried out:
- A mapping to determine, given a field tag and a certain subfield combination, to which FRBR entity it is related (Person, Corporate Body, Work, Expression). This mapping was applied to authority files.
- A mapping to establish relationships between entities.
- A mapping to determine, given a field/subfield combination, to which property it can be mapped. Authority files were mapped to FR vocabularies, whereas bibliographic files were mapped to ISBD vocabulary. A number of properties from other vocabularies were also used.
The aforementioned mappings will be soon available to the library community and thus the BNE would like to contribute to the discussion of mapping MARC records to RDF; in addition, other libraries willing to transform their MARC records into RDF will be able to reuse such mappings.
Almost 7 million records transformed under an open license
Approximately 2.4 million bibliographic records have been transformed into RDF. They are modern and ancient monographies, sound-recordings and musical scores. Besides, 4 million authority records of persons, corporate names, uniform titles and subjects have been transformed. All of them belong to the bibliographic and authority catalogues of the BNE stored in MARC 21 format. As for the data transformation, the MARImbA (MARc mappIngs and rdf generAtor) tool has been developed and used. MARiMbA is a tool for librarians, whose goal is to support the entire process of generating RDF from MARC21 records. This tool allows using any vocabulary (in this case ISBD and FR family) and simplifies the process of assigning correspondences between RDFS/OWL vocabularies and MARC 21. As a result of this process, about 58 million triples have been generated in Spanish. These triples are high quality data with an important cultural value that substantially increases the presence of the Spanish language in the data cloud.
Once the data were described with IFLA models, and the bibliographic and authorities catalogues were generated in RDF, the following step was to connect these data with other existing knowledge RDF databases included in the Linking Open Data initiative. Thus, the data of the BNE are now linked or connected with data from other international data source through VIAF, the Virtual International Authority File.
The type of licence applied to the data is CC0 (Creative Commons Public Domain Dedication), a completely open licence aimed at promoting data reuse. With this project, the BNE adheres to the Spanish Public Sector’s Commitment to openness and data reuse, as established in the Royal Decree 1495/ 2011 of 24 October, (Real Decreto 1495/2011, de 24 de octubre) on reusing information in the public sector, and also acknowledges the proposals of the CENL (Conference of European National Librarians).
In the short term, the next steps to carry out include
- Migration of a larger set of catalogue records.
- Improvement of the quality and granularity of both the transformed entities and the relationships between them.
- Establishment of new links to other interesting datasets.
- Development of a human-friendly visualization tool.
- SKOSification of subject headings.
From BNE: Ana Manchado, Mar Hernández Agustí, Fernando Monzón, Pilar Tejero López, Ana Manero García, Marina Jiménez Piano, Ricardo Santos Muñoz and Elena Escolano. From UPM: Asunción Gómez-Pérez, Elena Montiel-Ponsoda, Boris Villazón-Terrazas and Daniel Vila-Suero.