Announcing Glottolog/Langdoc, a knowledge base of 175k references for (mostly) underdescribed languages

We are happy to announce Glottolog/Langdoc, a comprehensive knowledgbase of references treating (mostly) underdescribed languages from the whole world.

Glottolog/Langdoc is built upon a collation of 20 source bibliographies covering the whole world, from Alaska to Australia. The original bibliographies were parsed and enriched with machine learning techniques. This allows to formulate queries such as

and combinations thereof such as

Furthermore, an areally and genalogically balanced sample can be drawn.

All references have their own URIs. All resources are available as xhtml and rdf, and can be downloaded as bib, html, txt, or via Zotero. Dumps of references are available as a very large *bib and as a dump in rdf+xml. Glottolog/Langdoc content is made available under CC-BY-NC. Intercultural issues upstream unfortunately prevent us from releasing the content under a more permissive license.

Glottolog/Langdoc uses DCMI, BIBO, FRBR, and ISBD ontologies to provide an interoperable resource. Glottolog is part of the Linguistic Linked Open Data cloud. We are working on a SPARQL endpoint, which will probably be made available in June this year.

We are happy to contribute our bibliographical resource to the Linked Data cloud and would welcome feedback under glottolog@eva.mpg.de.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <section align="" class="" dir="" lang="" style="" xml:lang=""> <style media="" type="" scoped="">