Open Bibliography and Open Bibliographic Data » LOD-LAM http://openbiblio.net Open Bibliographic Data Working Group of the Open Knowledge Foundation Tue, 08 May 2018 15:46:25 +0000 en-US hourly 1 http://wordpress.org/?v=4.3.1 Metadata for over 20 Million Cultural Objects released into the Public Domain by Europeana http://openbiblio.net/2012/09/12/metadata-for-over-20-million-cultural-objects-released-by-europeana/ http://openbiblio.net/2012/09/12/metadata-for-over-20-million-cultural-objects-released-by-europeana/#comments Wed, 12 Sep 2012 18:46:35 +0000 http://bibliography.okfn.org/?p=2944 Continue reading ]]> Europeana today announced that its dataset comprising descriptions of more than 20 Million cultural objects is from now on openly licensed with Creative Commons’ public domain waiver CC0. From the announcement:

Europeana logo

Opportunities for apps developers, designers and other digital innovators will be boosted today as the digital portal Europeana opens up its dataset of over 20 million cultural objects for free re-use.

The massive dataset is the descriptive information about Europe’s digitised treasures. For the first time, the metadata is released under the Creative Commons CC0 Public Domain Dedication, meaning that anyone can use the data for any purpose – creative, educational, commercial – with no restrictions. This release, which is by far the largest one-time dedication of cultural data to the public domain using CC0 offers a new boost to the digital economy, providing electronic entrepreneurs with opportunities to create innovative apps and games for tablets and smartphones and to create new web services and portals.

Europeana’s move to CC0 is a step change in open data access. Releasing data from across the memory organisations of every EU country sets an important new international precedent, a decisive move away from the world of closed and controlled data.

Thanks to all the people who made this possible! See also Jonathan Gray’s post at the Guardian’s Datablog.

Update 30 September 2012: Actually, it is not true to call this release “by far the largest one-time dedication of cultural data to the public domain using CC0”. In December 2011 two German library networks released their catalog b3kat under CC0 which by then held 22 million descriptions of bibliographic resources. See this post for more information.

]]>
http://openbiblio.net/2012/09/12/metadata-for-over-20-million-cultural-objects-released-by-europeana/feed/ 0
Linked Data in worldcat.org http://openbiblio.net/2012/06/23/linked-data-in-worldcat-org/ http://openbiblio.net/2012/06/23/linked-data-in-worldcat-org/#comments Sat, 23 Jun 2012 10:57:59 +0000 http://openbiblio.net/?p=2781 Continue reading ]]> This post was first published on Übertext: Blog.

Two days ago OCLC announced that linked data has been added to worldcat.org. I took a quick look at it and just want to share some notes on this.

OCLC goes open, finally

I am very happy that OCLC – with using the ODC-BY license – finally managed to choose open licensing for WorldCat. Quite a change of attitude when you recall the attempt in 2008 to sneak in a restrictive viral copyright license as part of a WorldCat record policy (for more information see the code4lib wikipage on the policy change or my German article about it). Certainly, it were not at last the blogging librarians and library tech people, the open access/open data proponents etc. who didn’t stop to push OCLC towards openness, who made this possible. Thank you all!

Of course, this is only the beginning. One thing is, that dumps of this WorldCat data aren’t available yet (see follow-up addendum here), thus, making it necessary to crawl the whole WorldCat to get hold of the data. Another thing is, that there probably is a whole lot of useful information in WorldCat that isn’t part of the linked data in worldcat.org yet .

schema.org in RDFa and microdata

What information is actually encoded as linked data in worldcat.org? And how did OCLC add RDF to worldcat.org? It used the schema.org vocabulary to add semantic markup to the HTML. This markup is both added as microdata – the native choice fo schema.org vocab – as well as in RDFa. schema.org lets people choose how to use the vocabulary, on the schema.org blog it recently said: “Our approach is “Microdata and more”. As implementations and services begin to consume RDFa 1.1, publishers with an interest in mixing schema.org with additional vocabularies, or who are using tools like Drupal 7, may find RDFa well worth exploring.

Let’s take a look at a description of a bibliographic resource in worldcat.org, e.g. http://www.worldcat.org/title/linked-data-evolving-the-web-into-a-global-data-space/oclc/704257552The part of the HTML source containing the semantic markup is marked as “Microdata Section” (although it does also contain RDFa). As the HTML source isn’t really readable for humans, we need to get hold of the RDF in a readable form first to have a look at it. I prefer the turtle syntax for looking at RDF. One can get the RDF contained in the HTML out using the RDFa distiller provided by the W3C. More precisely you have to use the distiller that supports RDFa 1.1 as schema.org supports RDFa 1.1 and, thus, worldcat.org is enriched according to the RDFa 1.1 standard.


However, using the distiller on the example resource I can get back a turtle document that contains the following triples:

1:  @prefix library: <http://purl.org/library/> .  
2:  @prefix madsrdf: <http://www.loc.gov/mads/rdf/v1#> .  
3:  @prefix owl: <http://www.w3.org/2002/07/owl#> .  
4:  @prefix schema: <http://schema.org/> .  
5:  @prefix skos: <http://www.w3.org/2004/02/skos/core#> .  
6:  <http://www.worldcat.org/oclc/707877350> a schema:Book;  
7:    library:holdingsCount "1"@en;  
8:    library:oclcnum "707877350"@en;  
9:    library:placeOfPublication [ a schema:Place;  
10:        schema:name "San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) :"@en ];  
11:    schema:about [ a skos:Concept;  
12:        schema:name "Web site development."@en;  
13:        madsrdf:isIdentifiedByAuthority <http://id.loc.gov/authorities/subjects/sh98004795> ],  
14:      [ a skos:Concept;  
15:        schema:name "Semantic Web."@en;  
16:        madsrdf:isIdentifiedByAuthority <http://id.loc.gov/authorities/subjects/sh2002000569> ],  
17:      <http://dewey.info/class/025/e22/>,  
18:      <http://id.worldcat.org/fast/1112076>,  
19:      <http://id.worldcat.org/fast/1173243>;  
20:    schema:author <http://viaf.org/viaf/38278185>;  
21:    schema:bookFormat schema:EBook;  
22:    schema:contributor <http://viaf.org/viaf/171087834>;  
23:    schema:copyrightYear "2011"@en;  
24:    schema:description "1. Introduction -- The data deluge -- The rationale for linked data -- Structure enables sophisticated processing -- Hyperlinks connect distributed data -- From data islands to a global data space -- Introducing Big Lynx productions --"@en,  
25:      "The World Wide Web has enabled the creation of a global information space comprising linked documents. As the Web becomes ever more enmeshed with our daily lives, there is a growing desire for direct access to raw data not currently available on the Web or bound up in hypertext documents. Linked Data provides a publishing paradigm in which not only documents, but also data, can be a first class citizen of the Web, thereby enabling the extension of the Web with a global data space based on open standards - the Web of Data. In this Synthesis lecture we provide readers with a detailed technical introduction to Linked Data. We begin by outlining the basic principles of Linked Data, including coverage of relevant aspects of Web architecture. The remainder of the text is based around two main themes - the publication and consumption of Linked Data. Drawing on a practical Linked Data scenario, we provide guidance and best practices on: architectural approaches to publishing Linked Data; choosing URIs and vocabularies to identify and describe resources; deciding what data to return in a description of a resource on the Web; methods and frameworks for automated linking of data sets; and testing and debugging approaches for Linked Data deployments. We give an overview of existing Linked Data applications and then examine the architectures that are used to consume Linked Data from the Web, alongside existing tools and frameworks that enable these. Readers can expect to gain a rich technical understanding of Linked Data fundamentals, as the basis for application development, research or further study."@en;  
26:    schema:inLanguage "en"@en;  
27:    schema:isbn "1608454312"@en,  
28:      "9781608454310"@en;  
29:    schema:name "Linked data evolving the web into a global data space"@en;  
30:    schema:publisher [ a schema:Organization;  
31:        schema:name "Morgan & Claypool"@en ];  
32:    owl:sameAs <http://dx.doi.org/10.2200/S00334ED1V01Y201102WBE001> .  

This looks quite nice to me. You see, how schema.org let’s you easily convey the most relevant information and the property names are well-chosen to make it easy for humans to read the RDF (in contrast e.g. to the ISBD vocabulary which uses numbers in the property URIs following the library tradition :-/).

The example also shows the current shortcomings of schema.org and where the library community might put some effort in to extending it, as OCLC has already been doing for this release with the experimental “library” extension vocabulary for use with Schema.org. E.g., there are no seperate schema.org properties for a table of content and an abstract so that they are both put into one string using ther schema:description property.

Links to other linked data sources

There are links to several other data sources: LoC authorities (lines 13, 16, 41, 44) , dewey.info (17), the linked data FAST headings (18,19), viaf.org (20,22) and an owl:sameAs link to the HTTP-DOI identifier (32). As most of these services are already run by OCLC and as the connections probably all were already existent in the data, creating these links wasn’t hard work, which of course doesn’t make them less useful.

Copyright information

What I found very interesting is the schema:copyrightYear property used in some descriptions in worldcat.org. I don’t know how much resources are covered with the indication of a copyright year and how accurate the data is, but this seems a useful source to me for projects like publicdomainworks.net.

Missing URIs

As with other preceding publications of linked bibliographic data there are some URIs missing for things we might want to link to instead of only serving the name string of the respecting entity: I am talking about places and publishers. Until now, AFAIK URIs for publishers don’t exist, hopefully someone (OCLC perhaps?) is already working on a LOD registry for publishers. For places, we have geonames but it is not that trivial to generate the right links. It’s not a great surprise that a lot of work has to be done to build the global data space.

]]>
http://openbiblio.net/2012/06/23/linked-data-in-worldcat-org/feed/ 3
BiblioHack: Day 1 http://openbiblio.net/2012/06/14/bibliohack-day-1/ http://openbiblio.net/2012/06/14/bibliohack-day-1/#comments Thu, 14 Jun 2012 10:25:46 +0000 http://openbiblio.net/?p=2742 Continue reading ]]> The first day of BiblioHack was a day of combinations and sub-divisions!

The event attendees started the day all together, both hackers and workshop / seminar attendees, and Sam introduced the purpose of the day as follows: coders – to build tools and share ideas about things that will make our shared cultural heritage and knowledge commons more accessible and useful; non-coders – to get a crash course in what openness means for galleries, libraries, archives and museums, why it’s important and how you can begin opening up your data; everyone – to get a better idea about what other people working in your domain do and engender a better understanding between librarians, academics, curators, artists and technologists, in order to foster the creation of better, cooler tools that respond to the needs of our communities.

The hackers began the day with an overview of what a hackathon is for and how it can be run, as presented by Mahendra Mahey, and followed with lightning talks as follows:

  • Talk 1 Peter Murray Rust & Ross Mounce – Content and Data Mining and a PDF extractor
  • Talk 2 Mike Jones – the m-biblio project
  • Talk 4 Ian Stuart – ORI/RJB (formerly OA-RJ)
  • Talk 5 Etienne Posthumus – Making a BibServer Parser
  • Talk 6 Emanuil Tolev – IDFind – identifying identifiers (“Feedback and real user needs won’t gather themselves”)
  • Talk 7 Mark MacGillivray – BibServer – what the project has been doing recently, how that ties into the open access index idea.
  • Talk 8 Tom Oinn – TEXTUS
  • Talk 9 Simone Fonda – Pundit – collaborative semantic annotations of texts (Semantic Web-related tool)
  • Talk 10 Ian Stuart – The basics of Linked Data

We decided we wanted to work as a community, using our different skills towards one overarching goal, rather than breaking into smaller groups with separate agendas. We formed the central idea of an ‘open bibliographic tool-kit’ and people identified three main areas to hack around, playing to their skills and interests:

  • Utilising BibServer – adding datasets and using PubCrawler
  • Creating an Open Access Index
  • Developing annotation tools

At this point we all broke for lunch, and the workshoppers and hackers mingled together. As hoped, conversations sprung up between people from the two different groups and it was great to see suggestions arising from shared ideas and applications of one group being explained to the theories of the other.

We re-grouped and the workshop continued until 16.00 – see here for Tim Hodson’s excellent write-up of the event and talks given – when the hackers were joined by some who attended the workshop. Each group gave a quick update on status, to try to persuade the new additions to the group to join their particular work-flow, and each group grew in number. After more hushed discussions and typing, the day finished with a talk from Tara Taubman about her background in the legalities of online security and IP, and we went for dinner. Hacking continued afterwards and we celebrated a hard day’s work down the pub, lookong forward to what was to come.

Day 2 to follow…

]]>
http://openbiblio.net/2012/06/14/bibliohack-day-1/feed/ 0
BibJSON updates http://openbiblio.net/2012/05/08/bibjson-updates/ http://openbiblio.net/2012/05/08/bibjson-updates/#comments Tue, 08 May 2012 14:30:19 +0000 http://openbiblio.net/?p=2669 Continue reading ]]> Following recent discussion on our mailing list, BibJSON has been updated to adopt JSON-LD for all your linked data needs.

This enables us to keep the core of BibJSON pretty simple whilst also opening up potential for more complex usage where that is required.

Due to this, we no longer use the “namespace” key in BibJSON.

Other changes include usage of “_” prefix on internal keys – so wherever our own database writes info into a record, we prefix it, such as “_id”. Because of this, uploaded BibJSON records can have an “id” key that will work, as well as an “_id” uuid applied by the BibServer system.

For more information, check out BibJSON.org and JSON-LD

]]>
http://openbiblio.net/2012/05/08/bibjson-updates/feed/ 1
Community discussions 2 http://openbiblio.net/2012/04/24/community-discussions-2/ http://openbiblio.net/2012/04/24/community-discussions-2/#comments Tue, 24 Apr 2012 16:53:22 +0000 http://openbiblio.net/?p=2617 Continue reading ]]> It’s been a funny few weeks, with Easter meaning that various people have been out-and-about at various times, but as always, the community never rests… Following on from Community Discussions (1), here are the latest goings-on to raise your interest and maybe your eyebrows:

  • Mark MacGillivray reported on the 29th March that the project is working with Total Impact to link their services to an instance of Bibserver

  • Multilingual matters in BibJSON arose again and, once more, JSON-LD was given backing as being useful for our purposes

  • Adrian Pohl discussed Nature’s release of 450,000 articles under a CC0 licence and followed up with this more in-depth article

  • Todd Robbins circulated Jim Pitman’s detailed article on author identity, which explores the issue of citations in relation to a non-existent publication (!) and includes recommendations for opening your own data

  • Antoine Isaac notified us of Europeana’s ‘Connecting Society to Culture Programme’, as part of the Hack4Europe! 2012 road show, including 3 hackathons in 3 different countries; it turns out that one in Berlin starts on the day our own Hackathon finishes – speaking of which…

  • Hackathon ‘show of hands’ request sent out! Contact naomi.lillie [@] okfn.org if you’d like to add your name to the list of interested people. This Hackathon is being run by Open Biblio with Open GLAM and DevCSI, on the 12th-14th June, in East London – more details to follow

  • Following the last Working Group meeting, Adrian Pohl expounded upon the role and goals of the Working Group, which is now given at http://openbiblio.net/about and http://openbiblio.net/get-involved

  • David Weinberger announced that Harvard has opened 12 million bibliographic records to the public domain under a CC0 licence – see his blog post to read more about it and for links to further information

  • Finally, Adrian announced the next Working Group meeting on 7th May, where Mark MacGillivray will be updating the community on project developments

As always, thanks to our amazing community for promoting and leading open bibliographic data in all its manifestations. To become part of the group and get your voice heard, sign up to the List here.

]]>
http://openbiblio.net/2012/04/24/community-discussions-2/feed/ 0
Europeana and Linked Open Data http://openbiblio.net/2012/03/19/europeana-and-linked-open-data/ http://openbiblio.net/2012/03/19/europeana-and-linked-open-data/#comments Mon, 19 Mar 2012 08:53:44 +0000 http://openbiblio.net/?p=2484 Continue reading ]]> Europeana has recently released a new version of its Linked Data Pilot, data.europeana.eu. We now publish data for 2.4 million objects under an open metadata licence: CC0, the Creative Commons Public Domain Dedication. This post elaborates on this earlier one by Naomi.

The interest of Europeana for Linked Open Data

Europeana aims to provide the widest access possible to the European cultural heritage massively published through digital resources by hundreds of musea, libraries and archives. This includes empowering other actors to build services that contribute to such access. Making data openly available to the public and private sectors alike is thus central to Europeana’s business strategy. We are also trying to provide a better service by making available richer data than the one very often published by cultural institutions. Data where millions of texts, images, videos and sounds are linked to other relevant resources: persons, places, concepts…

Europeana has therefore been interested for a while in Linked Data, as a technology that facilitates these objectives. We entirely subscribe to the views expressed in the W3C Library Linked Data report, which shows the benefits (but also acknowledges the challenges) of Linked Data for the cultural sector.

Europeana’s first toe in the Linked Data water

Last year, we released a first Linked Data pilot at data.europeana.eu. This has been a very exciting moment, a first opportunity for us to play with Linked Data.

We could deploy our prototype relatively easily and the whole experience was extremely valuable, from a technical perspective. In particular, this has been the first large-scale implementation of Europeana’s new approach to metadata, the Europeana Data Model (EDM). This model enables the representation of much richer data compared to the current format used by Europeana in its production service. First, our pilot could use EDM’s ability to represent several perspectives over a cultural object. We have used it to distinguish the original metadata our providers send us, from the data that we add ourselves. Among the Europeana data there are indeed enrichments that are created automatically and are not checked by professional data curators. For trust purposes, it is important that data consumers can see the difference.

We could also better highlight a part of Europeana’s added value as a central point for accessing digitized cultural material, in direct connection with the above mentioned enrichment. Europeana indeed employs semantic extraction tools that connect its objects with large multilingual reference resources available as Linked Data, in particular Geonames and GEMET. This new metadata allows us to deliver a better search service, especially in a European context. With the Linked Data pilot we could explicitly point at them, in the same environment they are published in. We hope this will help the entire community to better recognize the importance of these sources, and continue to provide authority resources in interoperable Linked Data format, using for example the SKOS vocabulary.

If you are interested in more lessons learnt from a technical perspective, we have published more of them in a technical paper at the Dublin Core conference last year. Among the less positive aspects, data.europeana.eu is still not part of the production system behind the main europeana.eu portal. It does not come with the guarantee of service we would like to offer for the linked data server, though the provision of data dumps is not impacted by this.

Making progress on Open Data

Another downside is that data.europeana.eu publishes data only for a subset of the objects the our main portal provides access to. We started with 3.5 million objects over a total of 20 millions. These were selected after a call for volunteers, to which only few providers answered. Additionally, we could not release our metadata under fully Open terms. This was clearly an obstacle to the re-use of our data.

After several months we have thus released a second version of data.europeana.eu. Though still a pilot, it nows contain fully open metadata (CC0).

The new version concerns an even smaller subset of our collections: in February 2012, data.europeana.eu contains metadata on 2.4 million objects. But this must be considered in context. The qualitative step of fully open publication is crucial to us. And over the past year, we have started an active campaign to convince our community of opening up their metadata, allowing everyone to make it work harder for the benefits of end users. The current metadata served at data.europeana come from data providers who have reacted early and positively to our efforts. We trust we will be able to make metadata available for many more objects in the coming year.

In fact we hope that this Linked Open Data pilot can contribute a part of our Open Data advocacy message. We believe such technology can trigger third parties to develop innovative applications and services, stimulating end users’ interest for digitized heritage. This would of course help to convince more partners to contribute metadata openly in the future. We have released next to our new pilot an animation that conveys exactly this message, you can view it here.

For additional information about access to and technical details of the dataset, see data.europeana.eu and our entry on the Data Hub.

]]>
http://openbiblio.net/2012/03/19/europeana-and-linked-open-data/feed/ 0
Linked Open Data as explained by Europeana http://openbiblio.net/2012/03/01/linked-open-data-as-explained-by-europeana/ http://openbiblio.net/2012/03/01/linked-open-data-as-explained-by-europeana/#comments Thu, 01 Mar 2012 16:27:04 +0000 http://openbiblio.net/?p=2339 Continue reading ]]> Antoine Isaac recently sent an e-mail around the List to let us know that Europeana has published its first dataset, comprising 2.4 million objects, under CC0. Furthermore, the new Data Exchange Agreement, which data suppliers are required to sign in order to publish on Europeana (and already signed by national libraries, national museums and content providers for entire countries), comes into effect on 1 July 2012, after which all metadata in Europeana will be available as Open Data to the Public Domain!

This is brilliant news in itself, but what I found particularly enchanting was the animated video that Europeana created in support of this announcement:

Linked Open Data from europeana on Vimeo.

As mentioned before, I am not of a technical background, and sometimes terms and explanations are difficult for me to grasp; however – I understand this video! I think it’s a brilliant explanation of the detail of Linked Open Data (LOD): how metadata works together, why it’s important that it’s open, how the more open data is available the more can be done with it, etc. This provides great clarity on what it is we’re seeking to do, for those of us who can’t tell a gig from a meg. The Open Biblio readership is generally more savvy with the nitty-gritty workings of LOD than I – if not in fact doing this sort of stuff already – but how can you not love the simplicity of this video?! It’s engaging, interesting, mostly jargon-free and less than four minutes long… So, if you’re an advocate of LOD without really understanding the processes behind the philosophy, get yourself a cuppa, settle down to watch this, and be informed.

Thanks Europeana, keep up the good work!

Europeana’s press release provides more information about the above dataset release and video.

]]>
http://openbiblio.net/2012/03/01/linked-open-data-as-explained-by-europeana/feed/ 1
Linked Data at the Biblioteca Nacional de España http://openbiblio.net/2012/02/02/linked-data-at-the-biblioteca-nacional-de-espana/ http://openbiblio.net/2012/02/02/linked-data-at-the-biblioteca-nacional-de-espana/#comments Thu, 02 Feb 2012 08:29:23 +0000 http://openbiblio.net/?p=2201 Continue reading ]]> The following guest post is from the National Library of Spain and the Ontology Engineering Group (Technical University of Madrid (UPM)).

Datos.bne.es is an initiative of the Biblioteca Nacional de España (BNE) whose aim is to enrich the Semantic Web with library data.

This initiative is part of the project “Linked Data at the BNE”, supported by the BNE in cooperation with the Ontology Engineering Group (OEG) at the Universidad Politécnica de Madrid (UPM). The first meeting took place in September 2010, whereas the collaboration agreement was signed in October 2010. The first set of data was transformed and linked in April 2011, but a more significant set of data was done in December 2011.

The initiative was presented in the auditorium of the BNE on 14th December 2011 by Asunción Gómez-Pérez, Professor at the UPM and Daniel Vila-Suero, Project Lead (OEG-UPM), and by Ricardo Santos, Chief of Authorities, and Ana Manchado Mangas, Chief of Bibliographic Projects, both from the BNE. The attendant audience enjoyed the invaluable participation of Gordon Dunsire, Chair of the IFLA Namespace Group.

The concept of Linked Data was first introduced by Tim Berners-Lee in the context of the Semantic Web. It refers to the method of publishing and linking structured data on the Web. Hence, the project “Linked Data at the BNE” involves the transformation of BNE bibliographic and authority catalogues into RDF as well as their publication and linkage by means of IFLA-backed ontologies and vocabularies, with the aim of making data available in the so-called cloud of “Linked Open Data”. This project focuses on connecting the published data to other data sets in the cloud, such as VIAF (Virtual International Authority File) or DBpedia.
With this initiative, the BNE takes the challenge of publishing bibliographic and authority data in RDF, following the Linked Data Principles and under the CC0 (Creative Commons Public Domain Dedication) open license. Thereby, Spain joins the initiatives that national libraries from countries such as the United Kingdom and Germany have recently launched.

Vocabularies and models

IFLA-backed ontologies and models, widely agreed upon by the library community, have been used to represent the resources in RDF. Datos.bne.es is one of the first international initiatives to thoroughly embrace the models developed by IFLA, such as the FR models FRBR (Functional Requirements for Bibliographic Records), FRAD (Functional Requirements for Authority Data), FRSAD (Functional Requirements for Subject Authority Data), and ISBD (International Standard for Bibliographic Description).

FRBR has been used as a reference model and as a data model because it provides a comprehensive and organized description of the bibliographic universe, allowing the gathering of useful data and navigation. Entities, relationships and properties have been written in RDF using the RDF vocabularies taken from IFLA; thus FR ontologies have been used to describe Persons, Corporate Bodies, Works and Expressions, and ISBD properties for Manifestations. All these vocabularies are now available at Open Metadata Registry (OMR), with the status of published. Additionally, in cooperation with IFLA, labels have been translated to Spanish.
MARC21 bibliographic and authority files have been tested and mapped to the classes and properties at OMR. The following mappings were carried out:

  • A mapping to determine, given a field tag and a certain subfield combination, to which FRBR entity it is related (Person, Corporate Body, Work, Expression). This mapping was applied to authority files.
  • A mapping to establish relationships between entities.
  • A mapping to determine, given a field/subfield combination, to which property it can be mapped. Authority files were mapped to FR vocabularies, whereas bibliographic files were mapped to ISBD vocabulary. A number of properties from other vocabularies were also used.

The aforementioned mappings will be soon available to the library community and thus the BNE would like to contribute to the discussion of mapping MARC records to RDF; in addition, other libraries willing to transform their MARC records into RDF will be able to reuse such mappings.

Almost 7 million records transformed under an open license

Approximately 2.4 million bibliographic records have been transformed into RDF. They are modern and ancient monographies, sound-recordings and musical scores. Besides, 4 million authority records of persons, corporate names, uniform titles and subjects have been transformed. All of them belong to the bibliographic and authority catalogues of the BNE stored in MARC 21 format. As for the data transformation, the MARImbA (MARc mappIngs and rdf generAtor) tool has been developed and used. MARiMbA is a tool for librarians, whose goal is to support the entire process of generating RDF from MARC21 records. This tool allows using any vocabulary (in this case ISBD and FR family) and simplifies the process of assigning correspondences between RDFS/OWL vocabularies and MARC 21. As a result of this process, about 58 million triples have been generated in Spanish. These triples are high quality data with an important cultural value that substantially increases the presence of the Spanish language in the data cloud.

Once the data were described with IFLA models, and the bibliographic and authorities catalogues were generated in RDF, the following step was to connect these data with other existing knowledge RDF databases included in the Linking Open Data initiative. Thus, the data of the BNE are now linked or connected with data from other international data source through VIAF, the Virtual International Authority File.

The type of licence applied to the data is CC0 (Creative Commons Public Domain Dedication), a completely open licence aimed at promoting data reuse. With this project, the BNE adheres to the Spanish Public Sector’s Commitment to openness and data reuse, as established in the Royal Decree 1495/ 2011 of 24 October, (Real Decreto 1495/2011, de 24 de octubre) on reusing information in the public sector, and also acknowledges the proposals of the CENL (Conference of European National Librarians).

Future steps

In the short term, the next steps to carry out include

  • Migration of a larger set of catalogue records.
  • Improvement of the quality and granularity of both the transformed entities and the relationships between them.
  • Establishment of new links to other interesting datasets.
  • Development of a human-friendly visualization tool.
  • SKOSification of subject headings.

Team

From BNE: Ana Manchado, Mar Hernández Agustí, Fernando Monzón, Pilar Tejero López, Ana Manero García, Marina Jiménez Piano, Ricardo Santos Muñoz and Elena Escolano.
From UPM: Asunción Gómez-Pérez, Elena Montiel-Ponsoda, Boris Villazón-Terrazas and Daniel Vila-Suero.

]]>
http://openbiblio.net/2012/02/02/linked-data-at-the-biblioteca-nacional-de-espana/feed/ 2
German Library Networks BVB and KOBV release 23 Million Records http://openbiblio.net/2011/12/08/bvb-kobv-open-data/ http://openbiblio.net/2011/12/08/bvb-kobv-open-data/#comments Thu, 08 Dec 2011 13:21:46 +0000 http://openbiblio.net/?p=1797 Continue reading ]]> The two German library networks BVB (BibliotheksVerbund Bayern = Library Network Bavaria) and KOBV (Kooperativer Bibliotheksverbund Berlin-Brandenburg = Cooperative Library Network Berlin-Brandenburg) have recently released 23 million records from their joint union catalogue under a CC0 license. The opening of Bavaria’s open data portal was taken as occasion for publishing this data.

MARC records and Linked Open Data

The data was released as MARC records (MARC XML) and also an experimental Linked Open Data service was launched at [http://lod.b3kat.de/] with full dumps of the RDF data made available for download:

The data was exported on November, 22th 2011 and updates can be harvested via an OAI-PMH interface from http://bvbr.bib-bvb.de:8991/aleph-cgi/oai/oai_opendata.pl?verb=ListRecords&set=OpenData&metadataPrefix=marc21.

So far, no entries for this data exist on the Data Hub. Anyone?

Is openness becoming the new normal in Germany?

With this initiative, there is now open data from four German library networks available, BVB and KOBV now being the first to having released their entire union catalogue. The North Rhine-Westphalian Library Service Center (hbz) started in March 2010 with opening up data (see here) and in autumn 2010 libraries from the southwestern library network (SWB) followed (see here).

Hopefully this development will continue and libraries and related institutions will continue releasing open data, be it bibliographic data or other data produced by libraries.

]]>
http://openbiblio.net/2011/12/08/bvb-kobv-open-data/feed/ 0
Did you hear that loud bang? That was CENL releasing their data under CC0 http://openbiblio.net/2011/10/04/did-you-hear-that-loud-bang-that-was-cenl-releasing-their-data-under-cc0/ http://openbiblio.net/2011/10/04/did-you-hear-that-loud-bang-that-was-cenl-releasing-their-data-under-cc0/#comments Tue, 04 Oct 2011 08:09:04 +0000 http://openbiblio.net/?p=1582 Continue reading ]]> The conference of European National Librarians (CENL) came up with great news last wednesday! Data from all European national libraries will be published under an open license! From the announcement:

Meeting at the Royal Library of Denmark, the Conference of European National Librarians (CENL), has voted overwhelmingly to support the open licensing of their data. CENL represents Europe’s 46 national libraries, and are responsible for the massive collection of publications that represent the accumulated knowledge of Europe.

What does that mean in practice?
It means that the datasets describing all the millions of books and texts ever published in Europe – the title, author, date, imprint, place of publication and so on, which exists in the vast library catalogues of Europe – will become increasingly accessible for anybody to re-use for whatever purpose they want.

The first outcome of the open licence agreement is that the metadata provided by national libraries to Europeana.eu, Europe’s digital library, museum and archive, via the CENL service The European Library, will have a Creative Commons Universal Public Domain Dedication, or CC0 licence. This metadata relates to millions of digitised texts and images coming into Europeana from initiatives that include Google’s mass digitisations of books in the national libraries of the Netherlands and Austria.

See also this post by Richard Wallis.

(Thanks to Mathias for the title of this post.)

]]>
http://openbiblio.net/2011/10/04/did-you-hear-that-loud-bang-that-was-cenl-releasing-their-data-under-cc0/feed/ 0