DBLP releases its 1.8 million bibliographic records as open data

The following guest post is by Marcel R. Ackermann who works at the Schloss Dagstuhl – Leibniz Center for Informatics on expanding the DBLP computer science bibliography.

Computer Science literature

Right from the early days of the DBLP, the decision has been made to make its whole data set publically available. Yet, only at the age of 18 years, DBLP adopted an open-data license.

The DBLP computer science bibliography provides access to the metadata of over 1.8 million publications, written by over 1 million authors in several thousands of journals or conference proceedings series. It is a helpful tool in the daily work of researchers and computer science enthusiasts from around the world. Although DBLP started with a focus on database systems and logic programming (hence the acronym), it has grown to cover all disciplines of computer science.

The success of DBLP wasn’t planned. In 1993, Michael Ley from the University of Trier, Germany, started a simple webserver to play around with this so-called “world wide web” everybody was so excited about in these days. He chose to set up some webpages listing the table of contents of recent conference proceedings and journal issues, some other pages listing the articles of individual authors, and provided hyperlinks back and forth between these pages. People from the computer science community found this quite useful, so he just kept adding papers. Funds were raised to hire helpers, some new technologies were implemented, and the data set grew over the years.

The approach of of DBLP has always been a pragmatic one. So it wasn’t until the recent evolution of DBLP into a joint project of the University of Trier and Schloss Dagstuhl – Leibniz Center for Informatics that the idea of finding a licensing model came to our minds. In this process, we found the source material and the commentaries provided by the Open Knowledge Foundation quite helpful. We quickly concluded that either the PDDL or the ODC-by license would be the right choice for us. In the end, we choose ODC-by since, as researchers ourself, it is our understanding that external sources should be referenced. Although from a pragmatic point of view, nothing has changed at all for DBLP (since permissions to use, copy, redistribute and modify had been generally granted before) we hope that this will help to clarify the legal status of the DBLP data set.

For additional information about access to and technical details of the dataset see the corresponding entry on the Data Hub.

Credits: Photo licensed CC-BY-SA by Flickr user Unhindered by Talent.

This entry was posted in Data, guest post, licensing. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *