Characterising the British Library Bibliographic dataset

Posted on November 18, 2010 by benosteen

Having RDF data is good. Having Linkable data is better but having some idea of what sorts of properties you can expect to find within a triplestore or block of data can be crucial. That sort of broad-stroke information can be vital in letting you know when a dataset contains interesting data that makes the work to use it worthwhile.

I ran the recently re-released BL RDF data (get from here or here) (CC0) through a simple script that counted occurrences of various elements within the 17 files, as well as enumerating all the different sorts of property you can expect to find.

Some interesting figures:

Over 100,000 records in each file, 2.9 million ‘records’ in total. Each record is a blank node.
Three main types of identifier – a ‘(Uk)123….’, ‘GB123…’ and (as a literal) ‘URN:ISBN:123…’, but not all records have ISBNs as some of them predate it.
Nearly 29 million blank nodes in total.
11,187,804 uses of dcterms:subject, for an average of just under 4 per record (3.75…)
Uses properties from Dublin Core terms, OWL-Time, ISBD, and SKOS
dcterms:subject’s are all as SKOS declarations, and include the Dewey decimal, LCSH and MESH schemes. (Work to use id.loc.gov LCSH URIs instead of literals is underway)
Includes rare and valuable information, stored in properties such as dcterms:isPartOf, isReferencedBy, isReplacedBy, replaces, requires and dcterms:relation.

Google spreadsheet of the tallys

Occurrence trends through the 17 data files (BNBrdfdc01.xml –> 17.xml)

(The image is as Google spreadsheet exported it, click on the link above to go to the sheet itself to view it natively without axis distortion.)

Literals and what to expect:

I wrote another straightforward script that can mine sample sets of unique literals from the BNBrdfdc xml files.

Usage for ‘gather_test_literals.py’
Usage: python gather_test_literals.py path/to/BNBrdfdcXX.xml ns:predicate number_to_retrieve [redis_set_to_populate]


For example, to retrieve 10 literal values

from the bnodes within dcterms:publisher

 in BNBrdfdc01.xml:
python gather_test_literals.py BNBrdfdc01.xml "dcterms:publisher" 10

and to also push those values into a local

Redis set 'publisherset01' if Redis is running

and redis-py is installed:

python gather_test_literals.py BNBrdfdc01.xml "dcterms:publisher" 10 publisherset01

So, to find out what, at most, 10 of those intriguing ‘dcterms:isReferencedBy’ predicates contain in BNBrdfdc12.xml, you can run:

python gather_test_literals.py BNBrdfdc12.xml "dcterms:isReferencedBy" 10

(As long as gather_test_literals.py and the xml files are in the same directory of course)

Result:
Chemical abstracts, Soulsby no. 4061 Soulsby no. 3921 Soulsby no. 4018 Chemical abstracts

As the script gathers the literals into a set, it will only return when it has either reached the desired number of unique values, or has reached the end of the file.

Hopefully, this will help other people explore this dataset and also pull information from it. I have also created a basic Solr configuration that has fields for all the elements found in the BNB dataset here.

Posted in JISC OpenBib, OKFN Openbiblio | Tagged inf11, jisc, jiscEXPO, jiscopenbib, progressPosts | 2 Comments

JISC OpenBibliography: British Library data release

Posted on November 17, 2010 by Mark MacGillivray

The JISC OpenBibliography project is excited to announce that the British Library is providing a set of bibliographic data under CC0 Public Domain Dedication Licence.

We have initially received a dataset consisting of approximately 3 million records, which is now available as a CKAN package. This dataset consists of the entire British National Bibliography, describing new books published in the UK since 1950; this represents about 20% of the total BL catalogue, and we are working to add further releases. In addition, we are developing sample access methods onto the data, which we will post about later this week.

Agreements such as these are crucial to our community, as developments in areas such as Linked Data are only beneficial when there is content on which to operate. We look forward to announcing further releases and developments, and to being part of a community dedicated to the future of open scholarship.

Usage guide from BL:

This usage guide is based on goodwill. It is not a legal contract. We ask that you respect it.

Use of Data: This data is being made available under a Creative Commons CC0 1.0 Universal Public Domain Dedication licence. This means that the British Library Board makes no copyright, related or neighbouring rights claims to the data and does not apply any restrictions on subsequent use and reuse of the data. The British Library accepts no liability for damages from any use of the supplied data. For more detail please see the terms of the licence.

Support: The British Library is committed to providing high quality services and accurate data. If you have any queries or identify any problems with the data please contact metadata@bl.uk.

Share knowledge: We are also very interested to hear the ways in which you have used this data so we can understand more fully the benefits of sharing it and improve our services. Please contact metadata@bl.uk if you wish to share your experiences with us and those that are using this service.

Give Credit Where Credit is Due: The British Library has a responsibility to maintain its bibliographic data on the nation’s behalf. Please credit all use of this data to the British Library and link back to www.bl.uk/bibliographic/datafree.html in order that this information can be shared and developed with today’s Internet users as well as future generations.

Link to British Library announcement

Posted in JISC OpenBib | Tagged inf11, jisc, jiscEXPO, jiscopenbib, national library, progressPosts, WIN | 15 Comments

"Bundling" instances of author names together without using owl:sameas

Posted on November 17, 2010 by benosteen

Bundling?

It’s a verb I’ve taken from ”Glaser, H., Millard, I., Jaffri, A., Lewy, T. and Dowling, B. (2008) On Coreference and The Semantic Web http://eprints.ecs.soton.ac.uk/15765/” where the core idea is that you have a number of URIs that mean or reference the same real thing, and the technique they describe of bundling is to aggregate all those references together. The manner in which they describe is built on a sound basis in logic, and is related to (if not the same as) a congruent closure.

The notion of bundling I am using is not as rooted in terms of mathematical logic, because I need to convey an assertion that one URI is meant to represent the same thing that another URI represents in a given context and for a given reason. This is a different assertion, if only subtly different, than ‘owl:sameas’ asserts, but the difference is key for me.

It is best to think through an example of where I am using this – curating bibliographic records and linking authors together.

It’s an obvious desire – given a book or article, to find all the other works by an author of that said work. Technologically, with RDF this is a very simple proposition BUT the data needs to be there. This is the point where we come unstuck. We don’t really have that quality of data that firmly establishes that one author is the same as a number of others. String matching is not enough!

So, how do we clean up this data (converted to RDF) so that we can try to stitch together the authors and other entities in them?

See this previous post on augmenting British Library metadata so that the authors, publishers and so on are externally reference-able once they are given unique URIs. This really is the key step. Any other work that can be done to make any of the data about the authors and so on more semantically reference-able will be a boon to the process of connecting the dots, as I have done for authors with birth and/or death dates.

The fundamental aspect to realise is that we are dealing with datasets which have missing data, misrepresented data (typos), misinterpreted fields (ISBNs of £2.50 for example) and other non-uniform and irregular problems. Connecting authors together in datasets with these characteristics will rely on us and code that we write making educated guesses, and probabilistic assertions, based on how confident we are that things match and so on.

We cannot say for sure that something is a cast-iron match, only that we are above a certain limit of confidence that this is so. We also have to have a good reason as well.

Something else to take on board is that what I would consider to be a good match might not be good for someone else so there needs to be a manner to state a connection and to say why, who and how this match was made as well as a need to keep this data made up of assertions away from our source data.

I’ve adopted the following model for encoding this assertion in RDF, in a form that sits outside of the source data, as a form of overlay data and you can find the bundle ontology I’ve used at http://purl.org/net/bundle.rdf (pay no attention to where it is currently living):

Click to view in full, unsquished form:

The URIs shown to be ‘opmv:used’ in this diagram are not meant to be exhaustive. It is likely that a bundle may depend on a look-up or resolution service, external datasheets, authority files, csv lists, dictionary lists and so on.

Note that the ‘Reason’ class has few, if any, mandatory properties aside from its connection to a given Bundle and opmv:Process. Assessing if you trust a Bundle at this moment is very much based on the source and the agent that made the assertion. As things get more mature, more information will regularly find its place attached to a ‘Reason’ instance.

There are currently two subtypes of Reason: AlgorithmicReason and AgentReason. Straightforwardly, this is the difference between a machine-made match and a human-made match and use of these should aid the assessment of a given match.

Creating a bundle using python:

I have added a few classes to Will Waites’ excellent ‘ordf’ library, and you can find my version here. To create a virtualenv to work within, do as follows. You will need mercurial and virtualenv already installed:

At a command line – eg ‘[@localhost] $’, enter the following:

hg clone http://bitbucket.org/beno/ordf
virtualenv myenv
. ./myenv/bin/activate
(myenv) $ pip install ordf

So, creating a bundle of some URIs – “info:foo” and “info:bar”, due to a human choice of “They look the same to me :)”:

In python: code here

from ordf.vocab.bundle import Bundle, Reason, AlgorithmicReason, AgentReason


from ordf.vocab.opmv import Agent
from ordf.namespace import RDF, BUNDLE, OPMV, DC    # you are likely to use these yourself
from ordf.term import Literal, URIRef                          # when adding arbitrary triples
b = Bundle()
"""or if you don't want a bnode for the Bundle URI: b = Bundle(identifier="http://example.org/1")"""
"""

NB this also instantiates empty bundle.Reason and opmv.Process instances too

in b.reason and b.process which are used to create the final combined graph at the end"""
b.encapsulate( URIRef("info:foo"), URIRef("info:bar") )
""" we don't want the default plain Reason, we want a human reason:"""
r = AgentReason()
""" again, pass a identifier="" kw to set the URI if you wish"""
r.comment("They look the same to me :)")
"""Let them know who made the assertion:"""
a = Agent()
a.nick("benosteen")
a.homepage("http://benosteen.com")
""" Add this agent as the controller of the process:"""

b.process.agent(a)
g = b.bundle_graph()   # this creates an in-memory graph of all the triples required to assert this bundle
""" easiest way to get it out is to "serialize" it:"""
print g.serialize()
==============

Output:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
   xmlns:bundle="http://purl.org/net/bundle#"
   xmlns:foaf="http://xmlns.com/foaf/0.1/"
   xmlns:opmv="http://purl.org/net/opmv/ns#"
   xmlns:ordf="http://purl.org/NET/ordf/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>
  <rdf:Description rdf:nodeID="PZCNCkfJ2">
    <rdfs:label> on monster (18787)</rdfs:label>
    <ordf:hostname>monster</ordf:hostname>
    <ordf:pid rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">18787</ordf:pid>
    <opmv:wasControlledBy rdf:nodeID="PZCNCkfJ9"/>
    <ordf:version rdf:nodeID="PZCNCkfJ4"/>
    <rdf:type rdf:resource="http://purl.org/net/opmv/ns#Process"/>
    <ordf:cmdline></ordf:cmdline>
  </rdf:Description>
  <rdf:Description rdf:nodeID="PZCNCkfJ0">
    <bundle:encapsulates rdf:resource="info:bar"/>
    <bundle:encapsulates rdf:resource="info:foo"/>
    <bundle:justifiedby rdf:nodeID="PZCNCkfJ5"/>
    <opmv:wasGeneratedBy rdf:nodeID="PZCNCkfJ2"/>
    <rdf:type rdf:resource="http://purl.org/net/bundle#Bundle"/>
  </rdf:Description>
  <rdf:Description rdf:nodeID="PZCNCkfJ5">
    <rdf:type rdf:resource="http://purl.org/net/bundle#Reason"/>
    <opmv:wasGeneratedBy rdf:nodeID="PZCNCkfJ2"/>
  </rdf:Description>
  <rdf:Description rdf:nodeID="PZCNCkfJ9">
    <foaf:nick>benosteen</foaf:nick>
    <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Agent"/>
    <foaf:homepage rdf:resource="http://benosteen.com"/>
  </rdf:Description>
  <rdf:Description rdf:nodeID="PZCNCkfJ4">
    <rdfs:label>ordf</rdfs:label>
    <rdf:value>0.26.391.901cf0a0995c</rdf:value>
  </rdf:Description>
</rdf:RDF>

Given a triplestore with these bundles, you can query for ‘same as’ URIs via which Bundles a given URI appears in.

Posted in JISC OpenBib, ORDF, Semantic Web | Tagged inf11, jisc, jiscEXPO, jiscopenbib, progress, projectMethodology, rdf | 7 Comments

Augmenting the British Library's RDF data to allow for disambiguation

Posted on November 17, 2010 by benosteen

The British Library have released what they term the ‘British National Bibliography’ (BNB) under a permissive licence. This constitutes just under 3 million records, and is derived from the ‘most polished set of bibliographic data’ as some of it dates back a good number of years.

This effort is to be applauded and the data that is represented by that set is turning out to be reasonably high quality, with some errors due to XSLT problems rather than problems with the source data.

However, the RDF that is being created is very heavily dominated by blank nodes – each ‘record’ is an untyped blank node which has many properties that end in blank nodes, which in turn have rdf:value’s stating what that property’s value is.

For example:

  <rdf:Description>
    <dcterms:title>Tooley's dictionary of mapmakers.</dcterms:title>
    <dcterms:contributor>
      <rdf:Description>
        <rdfs:label>Tooley, R. V. (Ronald Vere), 1898-1986
      </rdf:Description>
    </dcterms:contributor>

etc...

  </rdf:Description>
  <rdf:Description>
....

This has a number of drawbacks as much of the data is unlinkable – you cannot reference it outside of a given triplestore, as well as the author name information being mixed (includes date information), and RDF errors in the file.

Another issue is that the data is held in 17 very large xml files, which makes it very hard to address individual records as independent documents.

The first task is to augment this data such that:

The ‘records’, authors, publishers, and related item bnodes are given unique, globally referenceable URIs
The items themselves are given a type, based on the literal values present within bnodes linked to the item by the dc:type property. (eg bibo:Book)
Any MARC -> RDF/XML errors are cleaned up (notably, there are a few occasions of rdf:description, rather than rdf:Description in there)
For authors with more authoritative names (eg Smith, John, b. 1923 or similar), to break up the dates into a more semantic construction.

You can find the script that will do this augmentation at https://bitbucket.org/okfn/jiscobib/src/4ddaa37e44a2/BL_scripts/BLDump_convert_and_store.py

This script requires the lxml python module for xpath support as well as the pairtree module to store the records as individual documents. The script should be able to process all 17 files in a few hours, but make sure you have plenty of disc space.

It’s easiest to explain what is happening to the individual author/publisher nodes by use of a diagram:

For example, Original fragment:

    <dcterms:contributor>
      <rdf:Description>
        <rdfs:label>Tooley, R. V. (Ronald Vere), 1898-1986
      </rdf:Description>
    </dcterms:contributor>

To:

    <dcterms:contributor>
      <foaf:Agent rdf:about="http://purl.org/okfn/bl#agent-eea1ab4ff2be4baa6f9d623bdda5e852">
        <foaf:name>Tooley, R. V. (Ronald Vere)</foaf:name>
        <bio:event>
          <bio:Birth>
            <bio:date>1898</bio:date>
          </bio:Birth>
        </bio:event>
        <bio:event>
          <bio:Death>
            <bio:date>1986</bio:date>
          </bio:Death>
        </bio:event>
      </foaf:Agent>
    </dcterms:contributor>

The URIs are generated by taking an md5 hash of a number of values, including the full line from the file it appears on, the extracted author’s name, and the number of lines through the file it is. The idea was to generate URIs that were as unique as possible, but reproducable from the same set of data if the script was reran.

By giving the books, musical scores, authors, publishers and related works externally addressable works, it allows for third-party datasets, such as sameas.org, to overlay their version of which things are the same.

You can then choose the overlay dataset which links together the authors and so on based on how much you trust the matching techniques of the service, rather than glomming both original data and asserted data together inextricably.

Posted in JISC OpenBib, Semantic Web | Tagged inf11, jisc, jiscEXPO, jiscopenbib, progress | 5 Comments

Using the ORDF Fresnel Tool

Posted on November 7, 2010 by ww

The Fresnel Vocabulary for RDF provides a way to write down a set of instructions for transforming RDF statements into HTML for display. For some time, the ORDF library has included two implementations of Fresnel, one in JavaScript and one in Python. Recently added is a command line tool, simply called fresnel, for rendering HTML documents given a lens and an RDF graph. [more]

Posted in JISC OpenBib, OKFN Openbiblio, ORDF, Semantic Web | Tagged inf11, jisc, jiscEXPO, jiscopenbib | Leave a comment

Some obvious URI patterns for a service?

Posted on October 26, 2010 by benosteen

Whilst the technical issues and backends may vary, there are one or two URI patterns that may be adopted I think. It’s not REST, but it is a sensible structure I hope. (This is not to replace voID, but to accompany a voID description and other characterisation methods)

http://host/#catalog – URI for the catalog dataset

http://host/void
302 – conneg response to a voID description at .ttl, .rdf (xml), etc

http://host/describe/{uri} –
200 – responds with a conneg’d graph with the information a store ‘knows’ about a given URI. The HTML representation would likely be viewed as a ‘record’ page, insofar as this is valid for the item. (uses Content-Location: http://host/describe/{uri}/ttl etc rather than 302, due to load and network traffic cost.)
404 – doesn’t know about this uri

http://host/types
200 – voID-like Response based on the canned query ‘SELECT DISTINCT ?x WHERE {?foo rdf:type ?x)’ BUT with the addition of some lowest common denominator types. Can be easily cached. Filtering out the least important types is at the discretion of the service – this is not intended to be a complete set, but to publish the set of types that this service cares most about. Best shown by example (note that some predicates need to be minted/swapped for suitable ones. Shown by *):

<http://host/#catalog>  a void:Dataset ;
    *containsType* <myOnto:Researcher> ;
    *containsType* <myOnto:foo> ;
    etc...
    void:uriLookupEndpoint <http://host/describe/> ;
    etc...

<myOnto:Researcher> <owl:subclassOf> <foaf:Person> ;
<myOnto:foo> <owl:subclassOf> <bibo:Article> ;

Thoughts?

Posted in JISC OpenBib | Tagged inf11, jisc, jiscEXPO, jiscLMS, jiscopenbib, progress | 2 Comments

JISC OpenBibliography: Peter Murray-Rust at RLUK

Posted on October 20, 2010 by Mark MacGillivray

Peter Murray-Rust will be attending the RLUK conference on the 10th to 12th November 2010, and he has issued a request for community involvement on his blog. Please have a read, and get in touch if you are interested in contributing.

Posted in JISC OpenBib | Tagged inf11, jisc, jiscEXPO, jiscopenbib | Leave a comment

JISC OpenBibliography: progress report

Posted on October 20, 2010 by Mark MacGillivray

The JISC Open Bibliography project has had some success recently, and is proceeding to build on that success with further development and advocacy. This seems like a good opportunity to recap the project so far, and to consider what is coming next.

Project recap:

Advocacy – for open bibliographic data; why it should be freely available, and what it means for it to be so.
Agreements – get publication of datasets; examples of organisations willing to share the data, and getting the data out to the community.
Developments – show what can be done with open bibliographic data, and identify what improvements / changes are required to do even more.

view all project posts

Progress so far:

Given that one of the key risks to this project was that there would be no data to work with, it is great that we already have some data – and a commitment to provide more, from CUL. Talks with other groups are proceeding well, and there should be more sources of data available very soon.

The initial data release is limited, but most importantly it provides the impetus to get the ball rolling; and despite this Ben has made progress with analysing the data and has developed an example use case solution to IUCr.

The open bibliography working group, led by the work of Adrian Pohl and Peter Murray-Rust, have also developed open bibliographic principles, with a view to using them as a basis upon which other organisations could provide open access to data.

What next:

The project is now moving into a phase of heavier development; the currently available datasets, and those that should soon become available, will provide the community with something to build on. With this in mind, Peter is taking part in the RLUK conference in November and has issued a request for community engagement.

Although development will increase, advocacy work will continue and will use development examples and open bibliographic principles to negociate for further commitments to provide open bibliographic data.

Summary:

The project has not yet suffered from the potential risks highlighted during proposal, and has already achieved partial success towards advocacy aims and is progressing well towards further successes. Development in the short term will provide the basis required to engage the wider community and to stimulate further development.

Posted in JISC OpenBib | Tagged inf11, jisc, jiscEXPO, jiscopenbib, progressPosts | 2 Comments

Principles for Open Bibliographic Data

Posted on October 15, 2010 by Adrian Pohl

For some time now the OKFN Working Group on Open Bibliographic Data has been working on Principles on Open Bibliographic Data. While first attempts were mainly directed towards libraries and other public institutions we decided to broaden the principle’s scope by amalgamating it with Peter Murray-Rust’s draft publisher guidelines. The results can be seen below. We ask anyone to review these principles, discuss the text and suggest improvements.

Principles on Open Bibliographic Data

Producers of bibliographic data such as libraries, publishers, or social reference management communities have an important role in supporting the advance of humanity’s knowledge. For society to reap the full benefits from bibliographic endeavours, it is imperative that bibliographic data be made openly available for free use and re-use by anyone for any purpose.

Bibliographic Data

In its narrowest sense the term ‘bibliographic data’ refers to data describing bibliographic resources (articles, monographs, electronic texts etc.) to fulfill two goals:

Identifying the described resource, i.e. pointing to a unique resource in the set of all bibliographic resources.
Addressing the described resource, i.e. indicating how/where to find the described resource.

Traditionally one description served both purposes at once by delivering information about:

author(s) (possibly including addresses and other contact details) and editor(s),
title,
publisher,
publication year, month and place,
title and identification of enclosing work (e.g. a journal),
page information,
format of work.

In the web environment the address can be a URL and the identification a URI (URN, DOI etc.). Identifiers thus fall under this narrow concept of ‘bibliographic data’.

Furthermore there is several other information about a bibliographic resource which in this document falls under the concept of bibliograhic data. This data might be produced by libraries as well as publishers or online communities of book lovers and social reference management systems:

Identifiers (ISBN, LCCN, OCLC number etc.)
rights associated with work
sponsorship (e.g. funding)
tags,
exemplar data (number of holdings, call number)
metametadata (administrative metadata (last modified etc.) probably often created automatically).
relevant links to wikipedia, google books, amazon etc.
cover images (self-scanned or from amazon)
table of content
links to digitizations of tables of content, registers, bibliographies etc.

Libraries as well produce authority files like:

name authority files,
subject authority files,
classifications.

We assert that the information associated with an indivdual work is in the public domain. It follows that an indivdiual bibliographic entry derived from the work itself is free of restrictive rights as are authority records. This holds true as well for individual authority records. There might only be rights on aggregations of bibliographic and authority data.

Formally, we recommend adopting and acting on the following principles:

Where bibliographic data or collections of bibliographic data are published it is critical that they be published with a clear and explicit statement of the wishes and expectations of the publishers with respect to re-use and re-purposing of individual bibliographic entries/elements, the whole data collection, and subsets of the collection. This statement should be precise, irrevocable, and based on an appropriate and recognized legal statement in the form of a waiver or license.
When publishing data make an explicit and robust license statement.
Many widely recognized licenses are not intended for, and are not appropriate for, metadata or collections of metadata. A variety of waivers and licenses that are designed for and appropriate for the treatment of are described here. Creative Commons licenses (apart from CC0), GFDL, GPL, BSD, etc. are NOT appropriate for data and their use is STRONGLY discouraged.
Use a recognized waiver or license that is appropriate for metadata.
The use of licenses which limit commercial re-use or limit the production of derivative works by excluding use for particular purposes or by specific persons or organizations is STRONGLY discouraged. These licenses make it impossible to effectively integrate and re-purpose datasets and prevent commercial activities that could be used to support data preservation.
If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used.
Furthermore, it is STRONGLY recommended that bibliographic data, especially where publicly funded, be explicitly placed in the public domain via the use of the Public Domain Dedication and Licence or Creative Commons Zero Waiver. This is in keeping with the public funding of most library institutions and the general ethos of sharing and re-use within the library community.
We strongly recommend explicitly placing bibliographic data in the Public Domain via PDDL or CC0.
While we appreciate that certain types of bibliographic metadata do require some extra work in their creation we strongly assert that making these open has major benefits not only to the community as a whole but also to the creator (author, publisher, library, etc.). Benefits include enhanced discoverability widening the potential usage of a work and “save-the-time-of-the-reader”. These types include:
- abstracts (whether generated by author, publisher, library or machine)
- keywords, subject headings and classification notations (whether generated by author, publisher, library or machine)
- reviews (either human or machine-generated)

As a fifth principle we string urge that creators of bibliographic metadata explicitly either dedicate this to the public domain or use an open licence.

Posted in OKFN Openbiblio | 10 Comments

Minutes: 4th virtual meeting of the OKFN Openbiblio Group

Posted on October 8, 2010 by Adrian Pohl

Date: October, 5th 2010, 16:00 UK time (BST)

Channel: Meeting was held via Skype

Participants

Adrian Pohl
Ben O’ Steen
Jim Pitman
Karen Coyle
John Mark Ockerbloom
Peter Murray-Rust

Minutes of former meetings

1st meeting: http://okfnpad.org/Qv1jMMQL1h
2nd meeting: http://okfnpad.org/wTnZOOxiRz
3rd meeting: http://openbiblio.net/2010/09/10/minutes-3rd-virtual-meeting-of-the-open-bibliography-group/

Agenda

Open Library Data Flyer

German version is in print, the text can be found here: https://wiki1.hbz-nrw.de/display/SEM/Textentwurf+Open-Data-Flyer What’s the state of affairs with an English version?

To Do: Send the German version to the list and do the layout for english version. Update German version in the OKFN Wiki.

Openbiblio principles

We got to do further work on the principles: http://okfnpad.org/openbibliography-principles

To Do: Adrian will pull the principles draft and Peter’s guideline draft for publishers together and Peter and Adrian (and anybody else who wants to) will work on putting it into one text.
–> The first version of the “melted” text of both drafts is here: http://okfnpad.org/publisher-guidelines

Openbiblio group’s web presence / Overviews over openbiblio projects and data sources to approach

Note: We didn’t make it to this agenda point. Adrian will do some changes and ask for feedback on the mailing list. Feel free to give suggestions and feedback upfront.

Background:
The group’s web presence is suboptimal. I (Adrian) made an etherpad with suggestions for improvement: http://okfnpad.org/openbiblio-group-on-the-web

Two aspects of an adjustment are:

Projects overview: Much information about ongoging projects circulates over the mailing list. It would be useful to collect them in an overview over all past and running projects. The existing project information on the Group’s wikipage has been moved to: http://wiki.okfn.org/relevant%20projects. Please add other projects.
Potential data sources: We should probably built an overview about potential data sources: Data sources of great interest for Open Data and whether anybody has already coneected them with what outcome. We perhaps should do this in CKAN to get better structured information.

Posted in minutes, OKFN Openbiblio | 1 Comment

Open Bibliography and Open Bibliographic Data