Importing Spanish National Library to BibServer

The Spanish National Library (Biblioteca Nacional de España or BNE) has released their library catalogue as Linked Open Data on the Datahub.

Initially this entry only containd the SPARQL endpoints and not downloads of the full datasets. After some enquiries from Naomi Lillie the entry was updated with links to the some more information and bulk downloads at: http://www.bne.es/es/Catalogos/DatosEnlazados/DescargaFicheros/

This library dataset is particularly interesting as it is not a ‘straightforward’ dump of bibliographic records. This is best explained by Karen Coyle in her blogpost.

For a BibServer import,  the implications are that we have to distinguish the types of record that is read by the importing script and take the relevant action before building the BibJSON entry. Fortunately the datadump was made as N-Triples already, so we did not have to pre-process the large datafile (4.9GB) in the same manner as we did with the German National Library dataset.

The Python script to perform the reading of the datafile can be viewed at https://gist.github.com/3225004

A complicating matter from a data wrangler’s point of view is that the field names are based on IFLA Standards, which are numeric codes and not ‘guessable’ English terms like DublinCore fields for example. This is more correct from an international and data quality point of view, but does make the initial mapping more time consuming.

 So when mapping a data item like https://gist.github.com/3225004#file_sample.nt we need to dereference each fieldname and map it to the relevant BibJSON entry.

As we identify more Linked Open Data National Bibliographies, these experiments will be continued under the http://nb.bibsoup.net/ BibServer instance.

This entry was posted in BibServer, Data, JISC OpenBib, OKFN Openbiblio and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *