Medline dataset

Announcing the CC0 Medline dataset

We are happy to report that we now have a full, clean public domain (CC0) version of the Medline dataset available for use by the community.

What is the Medline dataset?

The Medline dataset is a subset of bibliographic metadata covering approximately 98% of all PubMed publications. The dataset comes as a package of approximately 653 XML files, chronologically listing records in terms of the date the record was created. There are approximately 19 million publication records.

Medline is a maintained dataset, and updates chronologically append to the current dataset.

Read our explanation of the different PubMed datasets for further information.

Where to get it

The raw dataset can be downloaded from CKAN :

What is in a record

Most records contain useful non-copyrightable bibliographic metadata such as author, title, journal, PubMed record ID. Many also have DOIs. We have stripped out any potentially copyrightable material such as abstracts.

Read our technical description of a record for further information.

Sample usage

We have made an online visualisation of a sample of the Medline dataset – however the visualisation relies on WebGL which is not yet widely supported by all browsers. It should work in Chrome and probably FireFox4.

This is just one example, but shows what great things we can build and learn from when we have open access to the necessary data to do so.

This entry was posted in Data, JISC OpenBib, News, OKFN Openbiblio, Semantic Web and tagged , , , , , , , , , , . Bookmark the permalink.

3 Responses to Medline dataset

  1. Pingback: Literature Space « cistronic

  2. Why did you give up on the full PubMed record set, and just go with the MEDLINE subset?

  3. Pingback: CERIF Test Data « BRUCE at Brunel

Leave a Reply

Your email address will not be published. Required fields are marked *