Bibliographic models in RDF

Put it in RDF to solve all your problems!

As with most things in life, the reality is often a little more complex. If you are old enough, you may well remember when this very same cry was often uttered, but with ‘RDF’ above replaced by ‘XML’ or if you are older still, ‘SGML’.

We haven’t quite reached the tipping point with bibliographic data in RDF so that a defacto model and structure has clearly emerged. There are plenty of contenders though, each based on differing models for how this data should be encapsulated in RDF. The main characteristic difference is in how markedly hierarchical or flat the model structure is.

A model that has emerged from the library world is FRBR – Functional Requirements for Bibliographic Records. From wikipedia:

FRBR is a conceptual entity-relationship model developed by the International Federation of Library Associations and Institutions (IFLA) that relates user tasks of retrieval and access in online library catalogues and bibliographic databases from a user’s perspective. It represents a more holistic approach to retrieval and access as the relationships between the entities provide links to navigate through the hierarchy of relationships.

There are plenty of articles and documents online to explain further, so I will not take up your time with a summary of it, just my opinion. FRBR is very much built around the notion of books – what a book is, taking into account things like editions and so on. Where FRBR really does fall down a rabbit’s hole, is the consideration of things like serials and journal articles. Their treatment feels very much like an afterthought and the philosophical ideas of Work and Expression get very much more murky, especially when considering linking these records to conference papers and blog posts by the same article authors.

There is enough of a model, however, to render an understandable bibliographic ‘record’ for an article in RDF, and this post will give an example of this, using David Shotton and Silvio Peroni’s FaBIO ontology to encapsulate the information in a FRBR-like manner.

The data used comes from an IUCr paper “Nicotinamide-2,2,2-trifluoroethanol (2/1)” Acta Cryst. (2009). E65, o727-o728, which has RDF embedded in the HTML page itself. The original RDF looks something like this:

@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix prism: <http://prismstandard.org/namespaces/1.2/basic/>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.

<doi:10.1107/S1600536809007594>
     prism:eissn "1600-5368";
     prism:endingpage "728";
     prism:issn "1600-5368";
     prism:number "4";
     prism:publicationdate "2009-04-01";
     prism:publicationname "Acta Crystallographica Section E: Structure Reports Online";
     prism:rightsagent "med@iucr.org";
     prism:section "organic compounds";
     prism:startingpage "727";
     prism:volume "65";
     dc:creator "Bardin, J.",
         "Florence, A.J.",
         "Johnston, B.F.",
         "Kennedy, A.R.",
         "Wong, L.V.";
     dc:date "2009-04-01";
     dc:description "The nicotinamide (NA) molecules of the title compound, 2C6H6N2O.C2H3F3O, form centrosymmetric R22(8) hydrogen-bonded dimers via N-H...O contacts. The asymmetric unit contains two molecules of NA and one trifluoroethanol molecule disordered over two sites of equal occupancy. The packing consists of alternating layers of nicotinamide dimers and disordered 2,2,2-trifluoroethanol molecules stacking in the c-axis direction. Intramolecular C-H...O and intermolecular N-H...N, O-H...N, C-H...N, C-H...O and C-H...F interactions are present.";
     dc:identifier _9:S1600536809007594;
     dc:language "en";
     dc:link <http://scripts.iucr.org/cgi-bin/paper?fl2234>;
     dc:publisher "International Union of Crystallography";
     dc:rights <http://creativecommons.org/licenses/by/2.0/uk>;
     dc:source <urn:issn:1600-5368>;
     dc:subject "";
     dc:title "Nicotinamide-2,2,2-trifluoroethanol (2/1)";
     dc:type "text";
     dcterms:abstract "The nicotinamide (NA) molecules of the title compound, 2C6H6N2O.C2H3F3O, form centrosymmetric R22(8) hydrogen-bonded dimers via N-H...O contacts. The asymmetric unit contains two molecules of NA and one trifluoroethanol molecule disordered over two sites of equal occupancy. The packing consists of alternating layers of nicotinamide dimers and disordered 2,2,2-trifluoroethanol molecules stacking in the c-axis direction. Intramolecular C-H...O and intermolecular N-H...N, O-H...N, C-H...N, C-H...O and C-H...F interactions are present.".

This bibliographic information rendered into a FaBIO model (amongst other ontologies):

@prefix fabio: <http://purl.org/spar/fabio/> .
@prefix c4o: <http://purl.org/spar/c4o/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix frbr: <http://purl.org/vocab/frbr/core#> .
@prefix prism: <http://prismstandard.org/namespaces/basic/2.0/> .

:article
    a fabio:JournalArticle
    ; dc:title "Nicotinamide-2,2,2-trifluoroethanol (2/1)"
    ; dcterms:creator [ a foaf:Person ; foaf:name "Johnston, B.F." ]
    ; dcterms:creator [ a foaf:Person ; foaf:name "Florence, A.J." ]
    ; dcterms:creator [ a foaf:Person ; foaf:name "Bardin, J." ]
    ; dcterms:creator [ a foaf:Person ; foaf:name "Kennedy, A.R." ]
    ; dcterms:creator [ a foaf:Person ; foaf:name "Wong, L.V." ]
    ; dc:rights <http://creativecommons.org/licenses/by/2.0/uk>
    ; dc:language "en"
    ; fabio:hasPublicationYear "2009"
    ; fabio:publicationDate "2009-04-01"
    ; frbr:embodiment :printedArticle , :webArticle
    ; frbr:partOf :issue
    ; fabio:doi "10.1107/S1600536809007594"
    ; frbr:part :abstract
    ; prism:rightsagent "med@iucr.org" .

:abstract
    a fabio:Abstract
    ; c4o:hasContent "The nicotinamide (NA) molecules of the title compound, 2C6H6N2O.C2H3F3O, form centrosymmetric R22(8) hydrogen-bonded dimers via N-H...O contacts. The asymmetric unit contains two molecules of NA and one trifluoroethanol molecule disordered over two sites of equal occupancy. The packing consists of alternating layers of nicotinamide dimers and disordered 2,2,2-trifluoroethanol molecules stacking in the c-axis direction. Intramolecular C-H...O and intermolecular N-H...N, O-H...N, C-H...N, C-H...O and C-H...F interactions are present." .

:printedArticle
    a fabio:PrintObject
    ; prism:pageRange "727-728" .

:webArticle
    a fabio:WebPage
    ; fabio:hasURL "http://scripts.iucr.org/cgi-bin/paper?fl2234" .

:volume
    a fabio:JournalVolume
    ; prism:volume "65"
    ; frbr:partOf :journal .

:issue
    a fabio:JournalIssue
    ; prism:issueIdentifier "4"
    ; frbr:partOf :volume

:journal
    a fabio:Journal
    ; dc:title "Acta Crystallographica Section E: Structure Reports Online"
    ; fabio:hasShortTitle "Acta Cryst. E"
    ; dcterms:publisher [ a foaf:Organization ; foaf:name "International Union of Crystallography" ]
    ; fabio:issn "1600-5368" .

The most obvious model and ontology that has emerged for describing bibliographic metadata in RDF is the Bibliographic Ontology, developed by Frédérick Giasson and Bruce D’Arcus and has been in existence for long enough to gain acceptance by a number of other projects, such as EPrints, Talis Aspire and Chronicling America (The Chronicling America website at the Library of Congress provides a view on millions of page of digitized newspaper content from around the United States.)

The same data again, rendered this time using BIBO’s model and ontology, rather than a FRBR-like one:

@prefix bibo: <http://purl.org/ontology/bibo/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix frbr: <http://purl.org/vocab/frbr/core#> .
@prefix prism: <http://prismstandard.org/namespaces/basic/2.0/> .

<info:doi:10.1107/S1600536809007594>
    a bibo:Article
    ; dc:title "Nicotinamide-2,2,2-trifluoroethanol (2/1)"
    ; dc:isPartOf <urn:issn:16005368>
    ; bibo:volume "65"
    ; bibo:issue "4"
    ; bibo:pageStart "727"
    ; bibo:pageEnd "728"
    ; dc:creator :author1
    ; dc:creator :author2
    ; dc:creator :author3
    ; dc:creator :author4
    ; dc:creator :author5
    ; bibo:authorList (:author1 :author2 :author3 :author4 :author5)
    ; dc:rights <http://creativecommons.org/licenses/by/2.0/uk>
    ; dc:language "en"
    ; dc:date "2009-04-01"
    ; bibo:doi "10.1107/S1600536809007594"
    ; bibo:abstract "The nicotinamide (NA) molecules of the title compound, 2C6H6N2O.C2H3F3O, form centrosymmetric R22(8) hydrogen-bonded dimers via N-H...O contacts. The asymmetric unit contains two molecules of NA and one trifluoroethanol molecule disordered over two sites of equal occupancy. The packing consists of alternating layers of nicotinamide dimers and disordered 2,2,2-trifluoroethanol molecules stacking in the c-axis direction. Intramolecular C-H...O and intermolecular N-H...N, O-H...N, C-H...N, C-H...O and C-H...F interactions are present."
    ; prism:rightsagent "med@iucr.org" .

<urn:issn:16005368>
    a bibo:Journal
    ; dc:title "Acta Crystallographica Section E: Structure Reports Online"@en ;
    ; bibo:shortTitle "Acta Cryst. E"@en
    ; bibo:issn "1600-5368" .

:author1
    a foaf:Person
    ; foaf:name "Johnston, B.F." .

:author2
    a foaf:Person
    ; foaf:name "Florence, A.J." .

:author3
    a foaf:Person
    ; foaf:name "Bardin, J." .

:author4
    a foaf:Person
    ; foaf:name "Kennedy, A.R."

:author5
    a foaf:Person
    ; foaf:name "Wong, L.V."

Comments on which is the most useable, the most understandable and what is likely to be the better model for sharing this data with other people are most welcome. This is an area in which the community will have to chose a model, as practically, wrapping the information in any of the models is straightforward, but if you put it into a model that noone uses, the model becomes more of a data coffin, than a useful concept to use.

This entry was posted in JISC OpenBib and tagged , , , , , , , . Bookmark the permalink.

7 Responses to Bibliographic models in RDF

  1. Bruce D'Arcus says:

    It really depends on the data use cases. There are trade-offs in using the more complex modeling of FRBR. For the sorts of use cases I’m concerned about, that complexity was simply too high; in particular, for articles and other “part” items. For example, you could have modeled article, volume, issue, journal all with work-expression-manifestation descriptions, which starts to get insanely messy.

    The other thing I would say is that we designed BIBO to be usable in more complex modeling, much as you’ve used PRISM in your alternate example. So another way to have presented the trade-offs is to look at the basic modeling using BIBO, and to ask what additional complexity would need to be added to give a FRBR-like view, with what payoffs.

  2. Keith Alexander says:

    why the XML declarations above the turtle snippets?

  3. Hi Ben,

    First a quick question. In the Bibo example you haven’t included the link to the ‘web version’ – is there any reason for this?

    I think to some extent the example here isn’t complex enough to test which representation is better – and further, I’m not sure you really have enough data to really be sure the FaBio representation is quite what it claims.

    The ‘webArticle’ link, actually links to a metadata record, with links to further representations – e.g. PDF and HTML. The PDF seems (to me) to be a faithful rendition of the printed version of the article – complete with page numbers (which in FaBio are assumed to only apply to the printArticle), while the HTML is clearly a different type of representation of the same content (and no page numbers of course). If you were to represent this, it would start to look a bit more interesting maybe (although hard to feel that excited about it for me). Perhaps more to the point would be if there was a pre-print version and post-print version of the article – but in this case that doesn’t seem to be true, and in any case the FaBio modelling may well breakdown, as presumably the pre-print isn’t part of the journal? As Dan points out – it can get ridiculously complex.

    What does seem to me to be more useful in the FaBio version is the identification of both journal issue and volume as separate entities (presumably eventually with their own URIs to identify them). With Bibo to answer the query ‘give me all the items that belong to this journal issue’ looks like it would have to rely on consistent practice in recording the volume and issue numbers, whereas in FaBio you could just draw out all the articles that were part of the issue (if I’m reading this correctly). While you’d think that consistency in Volume/Issue numbers might be pretty good, with this example I note that the volume number printed on the PDF is ‘E65′ not ’65’ as in the metadata – so already we see confusion arising… Once you get into things that are named rather than numbered (e.g. assigning month titles, or abbreviations thereof, to issues) it will clearly get worse.

    On that basis I tend to prefer the FaBio for the additional structure in the volume/issue area, but feel that the addition of the frbr ’embodiment’ adds (for this item at least) unnecessary complexity. Not sure how easy it will be to reconcile this?

    • Bruce D'Arcus says:

      What does seem to me to be more useful in the FaBio version is the identification of both journal issue and volume as separate entities (presumably eventually with their own URIs to identify them). With Bibo to answer the query ‘give me all the items that belong to this journal issue’ looks like it would have to rely on consistent practice in recording the volume and issue numbers, whereas in FaBio you could just draw out all the articles that were part of the issue (if I’m reading this correctly).

      I think, again, you all are drawing the wrong conclusions. Bibo does have classes to describe issues. So if you have the data, or the algorithms to reliably generate that information, you can certainly encode it. Again: it really depends on the use case. My priority in the examples I’ve written is to get citation data encoded in reasonably structured RDF. So in that context, I don’t think it’s practical to ask, say, a Zotero user to help catalog journal issues. But we leave room for that in Bibo if you want.

  4. Rufus Pollock says:

    I think it is very likely we don’t need the full FRBR conception. The key thing is to describe the ‘actual’ objects (which I think correspond to FRBR Manifestations) and to be able to link them to a ‘Work/Ideal’ object which acts as a way of pulling together the many different instances of the approximately the same thing (and we can argue later as to whether the 1st edition versus 2nd edition of a book qualifies as a new Work/Ideal — we all agree that a 2nd printing (with a new isbn) does not …).

    Just maybe we’ll want an FRBR ‘Item’ to represent the fact we have multiple physical copies of something but this is far down the list — if it will be needed at all.

    Lastly I’d request more examples if possible:) — e.g. could we have a standard fiction title (Harry Potter would be good because we have lots of printings, different editions, translations etc).

  5. Bruce,

    Sorry – not meaning to draw incorrect conclusions about what Bibo can do – just based my comments on the representations shown here. So to generalise my comments – I think the representation of volumes/issues as distinct entities identified by their own URIs is probably useful if it can be achieved.

    I’m less sure in this particular example that a FRBRised representation is adding anything, and further I’m concerned about the ability to construct an accurate FRBRised representation from the information available in this example.

    In general I’d tend toward Bibo over FaBio to be honest just on the basis that Bibo seems (to me) to have more momentum and support behind it (in fact, this is the first time I’ve come across FaBio, whereas Bibo has been on my radar for sometime)

Leave a Reply

Your email address will not be published. Required fields are marked *