Following up on the earlier announcement that the British
Library has made the British National Bibliography available
under a public domain dedication, the JISC Open Bibliography
project has worked to make this data more useable.
The data has been loaded into a Virtuoso store that is queriable
through the SPARQL Endpoint and the URIs that we have assigned each
record use the ORDF software to make them dereferencable,
supporting perform content auto-negotiation as well as embedding RDFa
in the HTML representation.
The data contains some 3 million individual records and some 173
million triples. Indexing the data was a very CPU intensive process
taking approximately three days. Transforming and loading the source
data took about five hours.
To get an idea of the shape of the data, let us consider a sample
resource, http://bnb.bibliographica.org/entry/GB8102507 . Apart from
linkage between the various representations, the description of the
entity itself is as follows
@prefix ov: <http://open.vocab.org/terms/> . @prefix isbd: <http://iflastandards.info/ns/isbd/elements/> . @prefix bibo: <http://purl.org/ontology/bibo/> . @prefix bio: <http://purl.org/vocab/bio/0.1/> . @prefix dc: <http://purl.org/dc/terms/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <http://bnb.bibliographica.org/entry/GB8102507> a bibo:Book, bibo:Document; dc:source <http://bnb.bibliographica.org/dataset/BNBrdfdc03.xml#183143>; dc:isPartOf <http://bnb.bibliographica.org/dataset>; rdfs:seeAlso <http://purl.org/NET/book/isbn/0241105161#book>, <http://www4.wiwiss.fu-berlin.de/bookmashup/books/0241105161>; dc:title "A good man in Africa"; dc:language [ rdf:value "eng"^^dc:ISO639-2 ]; dc:extent [ rdfs:label "251p" ]; dc:contributor [ a foaf:Agent; foaf:name "Boyd, William"; skos:notation "Boyd, William, 1952-"; bio:event [ a bio:Birth; bio:date "1952"^^xsd:gYear ]; = <http://bibliographica.org/entity/735b02a8f051e2249e40fbd48112d033>; ]; dc:subject [ rdfs:label "Fiction in English" ], [ rdfs:label "1945-" ], [ rdfs:label "Texts" ], [ a skos:Concept; skos:inScheme <http://dewey.info/scheme/e18>; skos:notation "823/.9/1"^^<ddc:Notation> ], [ a skos:Concept; skos:inScheme <http://dewey.info/scheme/e19>; skos:notation "823/.914"^^<ddc:Notation> ]. dc:publisher [ a foaf:Agent; foaf:name "Hamilton"; skos:notation "Hamilton"; = <http://bibliographica.org/entity/c080da5b03a0786efa61e61123b359d9>; ]; dc:issued "1981"^^xsd:gYear; isbd:hasPlaceOfPublicationProductionDistribution [ rdfs:label "London" ]. bibo:identifier "GB8102507"; bibo:isbn <urn:isbn:0241105161>; ov:blid "008042853".
Some of the salient features of this representation are:
- Assignment of URIs for each entry in the British National
Bibliography under http://bnb.bibliographica.org/.- Linkage with rdfs:seeAlso to the RDF Book Mashup and RDF
Book Vocabulary.- Author and publisher are preserved as blank nodes as in the source
data but are augmented with owl:sameAs links into the
Bibliographica namespace to support further annotation,
correction, deduplication, etc.- Any series if present is promoted to a first-class entity in the
Bibliographica namespace for further processing.- Extraction of birth and death dates from the canonical
string representation for authors.- For authors, publishers and series, their name as present in the
source data is preserved using skos:notation whilst their
names less any metadata about birth and death are represented with
foaf:name.
The entire dataset is queriable through the SPARQL Endpoint and
makes use of some of the extended features of Virtuoso such as
full-text indexing. This is accomplished by using the bif:contains
built-in function and is what powers the search functionality on the
website. The default (example) query returns some details about all
books that have "Edinburgh" in their titles:
PREFIX dc: <http://purl.org/dc/terms/> PREFIX bibo: <http://purl.org/ontology/bibo/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT DISTINCT ?book ?title ?name ?description WHERE { ?book a bibo:Book . ?book dc:title ?title . ?title bif:contains "Edinburgh" . OPTIONAL { ?book dc:description ?description } . OPTIONAL { ?book dc:contributor ?author . ?author foaf:name ?name } } GROUP BY ?book LIMIT 50
It should be noted that only some predicates are indexed for full-text
searching, namely,
- rdf:value
- rdfs:label
- rdfs:comment
- skos:prefLabel
- skos:altLabel
- dc:title
- dc:description
- foaf:name
Further Work
An ultimate goal of our work in the Open Bibliography group at the
OKF is to enable the collection of rich metadata about the
relationships between works and authors, to document and map the
scholarly discourse. This dataset is an important building block to
help ground the references in such a project. However more immediatly
we will:
- Make a voiD description of this dataset describing its
properties in more detail available.- Make available a dump of the our dataset derived from the BNB
so that the data can be easily mirrored and copied for local
processing.- Correct the errors listed in the Errata section below.
though not necessarily in that order.
Errata
- ISBNs were represented in the source dataset as string literals of the
form URN:ISBN:0123456789 and were erroneously transformed to URIs in
violation of the rdfs:range of bibo:isbn.- Linkage between the resource and its representations,
foaf:isPrimaryTopicOf contains a typo in the predicate which may
make it difficult to use some RDF browsing clients that do not
infer the inverse of foaf:primaryTopic.
Pingback: Tweets that mention Querying the British National Bibliography | Open Biblio (graphic) Projects -- Topsy.com
Pingback: Open Knowledge Foundation Blog » Blog Archive » Milestone for Open Bibliographic Data: British Library Release 3 Million Records
Very nice work!
Pingback: JISC OpenBibliography: Progress report 2 | Open Biblio (graphic) Projects
Pingback: Unilever Centre for Molecular Informatics, Cambridge - Do you love books? Get involved! Bibliography wants to be Open « petermr’s blog
Pingback: Research Revealed Project Blog