Synchronic Usage Relatedness (SURel) - Test Set and Annotation Data

Synchronic Usage Relatedness (SURel) - Test Set and Annotation Data

Synchronic Usage Relatedness (SURel) - Test Set and Annotation Data

Type

ExperimentData

Author

Anna Hätty, Dominik Schlechtweg, Sabine Schulte im Walde

Description

This data collection includes:

  • a semantic meaning shift test set with 22 German lexemes exhibiting different degrees of meaning shifts from general language to the domain of cooking. It comes as a tab-separated csv file where each line has the form

lemma POS translations mean-relatedness-score frequency-GEN frequency-SPEC.

The 'mean relatedness score' denotes the annotation-based measure of semantic shift described in the paper. 'frequency GEN' and 'frequency SPEC' list the frequencies of the target words in the general-language corpus (GEN) and the domain-specific cooking corpus (SPEC). 'translations' provides English translations across senses, illustrating possible meaning shifts. Note that further senses might exist;

  • The full annotation tables from the annotators; they come in the form of a tab-separated csv file where each line has the form

sentence-1 rating comment sentence-2.

  • the annotation guidelines in English and German;
  • data visualization plots.

Find more information in the papers referenced below.

Related Resources:

  • WOCC: corpora from which the uses for annotation were sampled.
  • DURel: parallely annotated diachronic data set.

Reference

Anna Hätty, Dominik Schlechtweg, Sabine Schulte im Walde. 2019. SURel: A Gold Standard for Incorporating Meaning Shifts into Term Extraction. In Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM). Minneapolis, Minnesota, USA, 2019.

Dominik Schlechtweg, Sabine Schulte im Walde, Stefanie Eckmann. 2018. Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). New Orleans, Louisiana, USA, 2018.

Download

The resources are freely available for education, research and other non-commercial purposes. More information can be requested via email to the authors.

Sabine Schulte im Walde
Apl. Prof. Dr.

Sabine Schulte im Walde

Akademische Rätin (Associate Professor)

To the top of the page