Institute

Studying

Research


 

Diachroner Wortverwendungsbezug (DURel) - Test Set und Annotationsdaten

Type ExperimentData
Title Diachronic Usage Relatedness (DURel) - Test Set and Annotation Data
Author Dominik Schlechtweg, Sabine Schulte im Walde, Stefanie Eckmann

Description

This data collection supplementing the paper referenced below contains:

  • a semantic change test set with 22 German lexemes divided into two classes: (i) lexemes for which the authors found innovative or (ii) reductive meaning change occuring in Deutsches Textarchiv (DTA) in the 19th century. (Note that for some lexemes the change is already observable slightly before 1800 and some lexemes occur more than once in the test set (see paper).) It comes as a tab-separated csv file where each line has the form

lemma POS type description earlier later delta_later compare frequency_1750-1800/1850-1900 source

The columns 'earlier' and 'later' contain the mean of all judgments for the respective word. The columns 'delta_later' and 'compare' contain the predictions of the annotation-based measures of semantic change developed in the paper;

  • the full annotation table as annotators received it and a results table with rows in the same order. The result table comes in the form of a tab-separated csv file where each line has the form

lemma date1 date2 group annotator1 annotator2 annotator3 annotator4 annotator5 mean comments1 comments2 comments3 comments4 comments5

The columns 'date1' and 'date2' contain the date of the first and second use in the row. 'mean' contains the mean of all judgments for the use pair in this row without 0-judgments;

  • the annotation guidelines in English and German;
  • data visualization plots.

Find more information in the paper referenced below.


Reference

Dominik Schlechtweg, Sabine Schulte im Walde, Stefanie Eckmann. 2018. Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). New Orleans, Louisiana USA 2018.


Download

The resources are freely available for education, research and other non-commercial purposes. For download, click hier. More information can be requested via email to the authors.