Date: January, 3rd 2011, 16:00 GMT
Channels: Meeting was held via Skype and Etherpad
Participants
- Adrian Pohl
- Peter Murray-Rust
- Thad Guidry
- Thomas Krichel (first half)
- Karen Coyle (first half)
- Jim Pitman (second half)
Agenda
OCLC’s FAST release
- Is this open data according to the OKD? – Yes, see VoID description of the dataset
- The licence is attached to the whole dataset and not to individual resources
- Attribution probably is required on data set level too
- Until now, the data isn’t fully OKD-compliant because no full dump of the data exists
- Richard Cyganiak already created an entry on the Data hub for FAST data.
LCSH in Freebase
- thad reports that all LCSH will be imported into freebase.
- The facets will be separated out as well (as FAST does).
- Timeline: six months to a year
Re-Using DBLP data
- DBLP entry at the Data Hub: http://thedatahub.org/dataset/dblp
- Jim had some ideas about re-using the DBLP data in BibSoup, e.g. for collecting publications on deduplication.
- The isitopendata-enquiry is now resolved by Thomas
- Mark could run DBLP data in a BibServer instance but he cannot maintain it besides the other BibServer instances he already maintains
- Thad: DBLP data should be uploaded to ScraperWiki (1 GB maximum)
- Thad: maybe create a “base” at Freebase for DBLP or BibSoup.
- Freebase is fully versioned: edits can be reverted; queries can be run against older versions
- Thad: Freebase is more a backend data store but doesn’t have a proper GUI. BibServer might play the GUI role.
- ACTION: Mark helps Thad to set up a BibServer instance and Thad pushes the DBLP data into it
Writing and maintaining parsers etc. for BibServer
- Jim can write preliminary parsers but he can’t maintain the code, write bugfixes etc.
- Parsers are written in Python
- He would like to have someone else to do it.
- Thad proposes pushing the code to scraper wiki
- ACTION: Jim and Thad will communicate about how to pack up the code, publish and maintain it.
Jim’s recent efforts
- Problems with BibSoup: One can’t customize the GUI easily
- Because of this Jim started own efforts for a BibJSON webinterface.
- http://bibserver.berkeley.edu/cgi-bin/bibsoup/berkeley_statistics
- fully editable
- BibSoup example machine generated from this Google Scholar entry
- Problems: Google Scholar has no API. Many calls are necessary top get the data. It isn’t clear when google scholar will shut one off.
Openbiblio Sprint
- 17-19th of January: openbiblio sprint session in Cambridge? (suggested by Mark)
- Mark tries to get Etienne (new programmer from Amsterdam), Rufus, Ed Chamberlain (for some of the time), Naomi and Primavera together for a sprint session.
Action Collection
- Mark, Thad: set up a BibServer instance and push DBLP data into it
- Jim, Thad: communicate about how to pack up the code for parsers etc., how to publish and maintain it.