Open Bibliography and Open Bibliographic Data » model http://openbiblio.net Open Bibliographic Data Working Group of the Open Knowledge Foundation Tue, 08 May 2018 15:46:25 +0000 en-US hourly 1 http://wordpress.org/?v=4.3.1 Comparative Serialisation of RDF in JSON http://openbiblio.net/2011/05/04/comparative-serialisation-of-rdf-in-json/ http://openbiblio.net/2011/05/04/comparative-serialisation-of-rdf-in-json/#comments Wed, 04 May 2011 17:59:13 +0000 http://openbiblio.net/?p=1022 Continue reading ]]> This is a comparison of RDF-JSON and JSON-LD for serialising bibliographic RDF data. Given that we are also working
with BibServer we have taken a BibJSON document as our source data for
comparison. The objective was to both understand these two JSON
serialisations of RDF and also to look at the BibJSON profile to see how it
fits into such a framework.

Due to limitations of the display of large plain-text code snippets on the site, we have placed the actual content in this text file which you should refer to as we go along.

We used a BibJSON document, which comes from the examples on the
BibJSON homepage.

When converting this into the two RDF serialisations we invent a namespace

http://www.bibkn.org/bibjson/terms/

This namespace provisionally holds all predicates/keys that are used by BibJSON
and are not immediately clearly available in another ontology. These terms should
not under any circumstances be considered definitive or final, only indicative.

Now consider the RDF-JSON serialisation

Some key things to note about this serialisation:

  • There is no explicit shortening of URIs for predicates into CURIEs,
    all URIs are instead presented in full.
  • The subject of each predicate is a JSON object with up to 4 keys (value,
    type, datatype, lang). This means that it is not easy for the human
    eye to pick out the value of a particular predicate.
  • Of the two RDF serialisations, this is by far the most verbose
  • It is relatively difficult for a human to read and write

Compare this with the equivalent JSON-LD serialisation:

Some things to note about this serialisation:

  • It has a clear treatment of namespaces
  • It may be slightly inaccurate, as there are some parts of its specification
    which are ambiguous – feedback welcome
  • The object values cannot be taken as the value of the predicate,
    as they may contain datatype and/or language information in them, or may
    be surrounded by angled brackets.
  • It is relatively easy for a human to read and write

Both serialisations are capable of representing the same data, although JSON-LD
is far more terse and therefore easier to read and write. It is not, however,
possible to reliably treat JSON-LD as a pure list of key-value pairs in non-RDF
aware environments, as it includes RDF type and language semantics in the literal
values of objects. RDF-JSON does not suffer from this same issue within the object
literals, but in return its notation is more complex.

A serious lacking in RDF-JSON is explicit handling of CURIEs and namespaces,
and it could benefit from adopting the conventions laid out in JSON-LD – this
may bring the choice of which serialisation to use down to preference rather
than relying on any significant technical differences.

Each of the formats also comfortably represents BibJSON, and with the extensive lists of predicates provided in that specification it would be straightforward enough to do a full and proper treatment of BibJSON through one of these routes.

]]>
http://openbiblio.net/2011/05/04/comparative-serialisation-of-rdf-in-json/feed/ 1
Disambiguation, deduplication and 'ideals' http://openbiblio.net/2010/09/22/disambiguation-deduplication-and-ideals/ http://openbiblio.net/2010/09/22/disambiguation-deduplication-and-ideals/#comments Wed, 22 Sep 2010 12:38:45 +0000 http://openbiblio.net/?p=244 Continue reading ]]> (NB Republished from a mailing list conversation at http://lists.okfn.org/pipermail/open-bibliography/2010-August/000397.html – follow this link to see the comments and replies)

In my work on meshing bibliographic datasets together, I’ve been using a
conceptual tool that I would like to hear views on.

I am creating nodes for the ideals of things on records – whether that is
for people, journals or even the bibliographic document itself. The ideal
represents the best and most complete data for that thing – something we’ll
never really achieve, but that’s not the point. This ideal serves as a node,
a hook, on which we can join up records which describe the same thing
(person, frbr manifestation, etc) but which have differing data for.

It’s easy to consider it for ‘deduplications’ of say article references.
Consider two records, one from the ris feed from pubmed and one from a
citation in a plos article. These are found to be references to the same
article but as you can expect they differ, not just in terms of data but
also on terms of the source or author of that reference.

The way I am tackling this is by creating a node for the ideal bibliographic
reference each aspires to and when dupes are believed to be found, these
ideal nodes are joined into a bundle using sameas (in a different store) and
this bundle has some provenance triples recording the how when and why for
this merging (using open provenance model verbs/classes)

Eg:

:bibrec —> record node from pubmed

:citerec —> plos record

_i suffix —> ideal node

  • running analyser on record suggests two records are dupes, with a certain
    confidence score from a certain weighted matching (call this ‘heur.v0.13’)

Create ideal nodes Just In Time:

:bibrec hasIdeal :bibrec_i
:citerec hasIdeal :citerec_I

Make the bundle:

:b1 a Bundle
   sameas :bibrec_i
   sameas :citerec_I
   opmv:wasGeneratedBy :p1
   created: 2010-08-......

:p1 a opmv:Process
  Opmv:controlledBy :Ben
  Opmv:used :bibrec
  Opmv:used :citerec

:confidence a ConfidenceReport
  Opmv:wasGeneratedBy :p1
  Hasreport <url of doc>  # for time being

This structure let’s me create an aggregated rdf dataset with the best guess
ideal records at any one time. Also, bundles can be merged later if required
creating a tree structure – the top bundle instance and the ‘leaf’ records
form a congruent closure and are thus exportable as such without the admin
structure triples necessary for ongoing maintenance. The bundle notion comes
from the excellent work by the team at southampton, including Hugh glazer,
Ian milliard et al (google for coreference on the semantic web)

Using this technique for entities like people is actually very similar. If I
use the words ‘person’ and ‘persona’ for the ideal and the data in a record
respectively. The persona can have alternative spellings, and time-dependant
details like a fleeting institutional affiliation, and so on. The
(difficult) trick is spotting when two persona’s refer to the same person
but the process for merging is the same even if the creation of an
aggregated record for each is different.

]]>
http://openbiblio.net/2010/09/22/disambiguation-deduplication-and-ideals/feed/ 0
Bibliographic models in RDF http://openbiblio.net/2010/09/10/bibliographic-models-in-rdf/ http://openbiblio.net/2010/09/10/bibliographic-models-in-rdf/#comments Fri, 10 Sep 2010 14:56:12 +0000 http://openbiblio.net/?p=225 Continue reading ]]>

Put it in RDF to solve all your problems!

As with most things in life, the reality is often a little more complex. If you are old enough, you may well remember when this very same cry was often uttered, but with ‘RDF’ above replaced by ‘XML’ or if you are older still, ‘SGML’.

We haven’t quite reached the tipping point with bibliographic data in RDF so that a defacto model and structure has clearly emerged. There are plenty of contenders though, each based on differing models for how this data should be encapsulated in RDF. The main characteristic difference is in how markedly hierarchical or flat the model structure is.

A model that has emerged from the library world is FRBR – Functional Requirements for Bibliographic Records. From wikipedia:

FRBR is a conceptual entity-relationship model developed by the International Federation of Library Associations and Institutions (IFLA) that relates user tasks of retrieval and access in online library catalogues and bibliographic databases from a user’s perspective. It represents a more holistic approach to retrieval and access as the relationships between the entities provide links to navigate through the hierarchy of relationships.

There are plenty of articles and documents online to explain further, so I will not take up your time with a summary of it, just my opinion. FRBR is very much built around the notion of books – what a book is, taking into account things like editions and so on. Where FRBR really does fall down a rabbit’s hole, is the consideration of things like serials and journal articles. Their treatment feels very much like an afterthought and the philosophical ideas of Work and Expression get very much more murky, especially when considering linking these records to conference papers and blog posts by the same article authors.

There is enough of a model, however, to render an understandable bibliographic ‘record’ for an article in RDF, and this post will give an example of this, using David Shotton and Silvio Peroni’s FaBIO ontology to encapsulate the information in a FRBR-like manner.

The data used comes from an IUCr paper “Nicotinamide-2,2,2-trifluoroethanol (2/1)” Acta Cryst. (2009). E65, o727-o728, which has RDF embedded in the HTML page itself. The original RDF looks something like this:

@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix prism: <http://prismstandard.org/namespaces/1.2/basic/>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.

<doi:10.1107/S1600536809007594>
     prism:eissn "1600-5368";
     prism:endingpage "728";
     prism:issn "1600-5368";
     prism:number "4";
     prism:publicationdate "2009-04-01";
     prism:publicationname "Acta Crystallographica Section E: Structure Reports Online";
     prism:rightsagent "med@iucr.org";
     prism:section "organic compounds";
     prism:startingpage "727";
     prism:volume "65";
     dc:creator "Bardin, J.",
         "Florence, A.J.",
         "Johnston, B.F.",
         "Kennedy, A.R.",
         "Wong, L.V.";
     dc:date "2009-04-01";
     dc:description "The nicotinamide (NA) molecules of the title compound, 2C6H6N2O.C2H3F3O, form centrosymmetric R22(8) hydrogen-bonded dimers via N-H...O contacts. The asymmetric unit contains two molecules of NA and one trifluoroethanol molecule disordered over two sites of equal occupancy. The packing consists of alternating layers of nicotinamide dimers and disordered 2,2,2-trifluoroethanol molecules stacking in the c-axis direction. Intramolecular C-H...O and intermolecular N-H...N, O-H...N, C-H...N, C-H...O and C-H...F interactions are present.";
     dc:identifier _9:S1600536809007594;
     dc:language "en";
     dc:link <http://scripts.iucr.org/cgi-bin/paper?fl2234>;
     dc:publisher "International Union of Crystallography";
     dc:rights <http://creativecommons.org/licenses/by/2.0/uk>;
     dc:source <urn:issn:1600-5368>;
     dc:subject "";
     dc:title "Nicotinamide-2,2,2-trifluoroethanol (2/1)";
     dc:type "text";
     dcterms:abstract "The nicotinamide (NA) molecules of the title compound, 2C6H6N2O.C2H3F3O, form centrosymmetric R22(8) hydrogen-bonded dimers via N-H...O contacts. The asymmetric unit contains two molecules of NA and one trifluoroethanol molecule disordered over two sites of equal occupancy. The packing consists of alternating layers of nicotinamide dimers and disordered 2,2,2-trifluoroethanol molecules stacking in the c-axis direction. Intramolecular C-H...O and intermolecular N-H...N, O-H...N, C-H...N, C-H...O and C-H...F interactions are present.".

This bibliographic information rendered into a FaBIO model (amongst other ontologies):

@prefix fabio: <http://purl.org/spar/fabio/> .
@prefix c4o: <http://purl.org/spar/c4o/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix frbr: <http://purl.org/vocab/frbr/core#> .
@prefix prism: <http://prismstandard.org/namespaces/basic/2.0/> .

:article
    a fabio:JournalArticle
    ; dc:title "Nicotinamide-2,2,2-trifluoroethanol (2/1)"
    ; dcterms:creator [ a foaf:Person ; foaf:name "Johnston, B.F." ]
    ; dcterms:creator [ a foaf:Person ; foaf:name "Florence, A.J." ]
    ; dcterms:creator [ a foaf:Person ; foaf:name "Bardin, J." ]
    ; dcterms:creator [ a foaf:Person ; foaf:name "Kennedy, A.R." ]
    ; dcterms:creator [ a foaf:Person ; foaf:name "Wong, L.V." ]
    ; dc:rights <http://creativecommons.org/licenses/by/2.0/uk>
    ; dc:language "en"
    ; fabio:hasPublicationYear "2009"
    ; fabio:publicationDate "2009-04-01"
    ; frbr:embodiment :printedArticle , :webArticle
    ; frbr:partOf :issue
    ; fabio:doi "10.1107/S1600536809007594"
    ; frbr:part :abstract
    ; prism:rightsagent "med@iucr.org" .

:abstract
    a fabio:Abstract
    ; c4o:hasContent "The nicotinamide (NA) molecules of the title compound, 2C6H6N2O.C2H3F3O, form centrosymmetric R22(8) hydrogen-bonded dimers via N-H...O contacts. The asymmetric unit contains two molecules of NA and one trifluoroethanol molecule disordered over two sites of equal occupancy. The packing consists of alternating layers of nicotinamide dimers and disordered 2,2,2-trifluoroethanol molecules stacking in the c-axis direction. Intramolecular C-H...O and intermolecular N-H...N, O-H...N, C-H...N, C-H...O and C-H...F interactions are present." .

:printedArticle
    a fabio:PrintObject
    ; prism:pageRange "727-728" .

:webArticle
    a fabio:WebPage
    ; fabio:hasURL "http://scripts.iucr.org/cgi-bin/paper?fl2234" .

:volume
    a fabio:JournalVolume
    ; prism:volume "65"
    ; frbr:partOf :journal .

:issue
    a fabio:JournalIssue
    ; prism:issueIdentifier "4"
    ; frbr:partOf :volume

:journal
    a fabio:Journal
    ; dc:title "Acta Crystallographica Section E: Structure Reports Online"
    ; fabio:hasShortTitle "Acta Cryst. E"
    ; dcterms:publisher [ a foaf:Organization ; foaf:name "International Union of Crystallography" ]
    ; fabio:issn "1600-5368" .

The most obvious model and ontology that has emerged for describing bibliographic metadata in RDF is the Bibliographic Ontology, developed by Frédérick Giasson and Bruce D’Arcus and has been in existence for long enough to gain acceptance by a number of other projects, such as EPrints, Talis Aspire and Chronicling America (The Chronicling America website at the Library of Congress provides a view on millions of page of digitized newspaper content from around the United States.)

The same data again, rendered this time using BIBO’s model and ontology, rather than a FRBR-like one:

@prefix bibo: <http://purl.org/ontology/bibo/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix frbr: <http://purl.org/vocab/frbr/core#> .
@prefix prism: <http://prismstandard.org/namespaces/basic/2.0/> .

<info:doi:10.1107/S1600536809007594>
    a bibo:Article
    ; dc:title "Nicotinamide-2,2,2-trifluoroethanol (2/1)"
    ; dc:isPartOf <urn:issn:16005368>
    ; bibo:volume "65"
    ; bibo:issue "4"
    ; bibo:pageStart "727"
    ; bibo:pageEnd "728"
    ; dc:creator :author1
    ; dc:creator :author2
    ; dc:creator :author3
    ; dc:creator :author4
    ; dc:creator :author5
    ; bibo:authorList (:author1 :author2 :author3 :author4 :author5)
    ; dc:rights <http://creativecommons.org/licenses/by/2.0/uk>
    ; dc:language "en"
    ; dc:date "2009-04-01"
    ; bibo:doi "10.1107/S1600536809007594"
    ; bibo:abstract "The nicotinamide (NA) molecules of the title compound, 2C6H6N2O.C2H3F3O, form centrosymmetric R22(8) hydrogen-bonded dimers via N-H...O contacts. The asymmetric unit contains two molecules of NA and one trifluoroethanol molecule disordered over two sites of equal occupancy. The packing consists of alternating layers of nicotinamide dimers and disordered 2,2,2-trifluoroethanol molecules stacking in the c-axis direction. Intramolecular C-H...O and intermolecular N-H...N, O-H...N, C-H...N, C-H...O and C-H...F interactions are present."
    ; prism:rightsagent "med@iucr.org" .

<urn:issn:16005368>
    a bibo:Journal
    ; dc:title "Acta Crystallographica Section E: Structure Reports Online"@en ;
    ; bibo:shortTitle "Acta Cryst. E"@en
    ; bibo:issn "1600-5368" .

:author1
    a foaf:Person
    ; foaf:name "Johnston, B.F." .

:author2
    a foaf:Person
    ; foaf:name "Florence, A.J." .

:author3
    a foaf:Person
    ; foaf:name "Bardin, J." .

:author4
    a foaf:Person
    ; foaf:name "Kennedy, A.R."

:author5
    a foaf:Person
    ; foaf:name "Wong, L.V."

Comments on which is the most useable, the most understandable and what is likely to be the better model for sharing this data with other people are most welcome. This is an area in which the community will have to chose a model, as practically, wrapping the information in any of the models is straightforward, but if you put it into a model that noone uses, the model becomes more of a data coffin, than a useful concept to use.

]]>
http://openbiblio.net/2010/09/10/bibliographic-models-in-rdf/feed/ 7