Diachronic Usage Relatedness (DURel)

Test Set and Annotation Data for Lexical Semantic Change in DTA Corpus

Diachronic Usage Relatedness (DURel)

Type

Dataset

Author
Dominik Schlechtweg, Sabine Schulte im Walde, Stefanie Eckmann
Description

This data collection contains diachronic semantic relatedness judgments for German word usage pairs. Find a description of the data format, code to process the data and further datasets on the WUGsite.

We provide additional data under misc/:

  • testset: a semantic change test set with 22 German lexemes divided into two classes: lexemes for which the authors found

      1. innovative or
      2. reductive meaning change

      occurring in Deutsches Textarchiv (DTA) in the 19th century. Note that for some lexemes the change is already observable slightly before 1800 and some lexemes occur more than once in the test set (see paper). The columns 'earlier' and 'later' contain the mean of all judgments for the respective word. The columns 'delta_later' and 'compare' contain the predictions of the annotation-based measures of semantic change developed in the paper.
  • tables: the full annotation table as annotators received it and a results table with rows in the same order. The columns 'date1' and 'date2' contain the date of the first and second use in the row. 'mean' contains the mean of all judgments for the use pair in this row without 0-judgments.
  • plots: data visualization plots.

Please find more information on the provided data in the paper referenced below.

Reference

Dominik Schlechtweg, Sabine Schulte im Walde, Stefanie Eckmann. 2018. Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). New Orleans, Louisiana USA.

 

Download

The resources are freely available for education, research and other non-commercial purposes. More information can be requested via email to the authors.

Related Resources
  • WOCC: corpora from which the uses for annotation were sampled.
  • SURel: parallely annotated synchronic data set.
  • WUGs: Word Usage Graphs.
  • DURel Tool: semantic annotation tool for sentence pairs of a word.
  • Metaphoric Change: similarly annotated diachronic data set for metaphoric change.
This image shows Dominik Schlechtweg
 

Dominik Schlechtweg

Former employee

This image shows Sabine Schulte im Walde
Apl. Prof. Dr.

Sabine Schulte im Walde

Akademische Rätin (Associate Professor)

To the top of the page