Synchronic Usage Relatedness (SURel) - Test Set and Annotation Data

Testset and Annotation Data for Semantic Divergences between SdeWaC and COOK Corpus

Synchronic Usage Relatedness (SURel) - Test Set and Annotation Data

Type

ExperimentData

Author

Anna Hätty, Dominik Schlechtweg, Sabine Schulte im Walde

Description

This data collection includes:

  • semantic meaning shift test set with 22 German lexemes exhibiting different degrees of meaning shifts from general language to the domain of cooking. It comes as a tab-separated csv file where each line has the form

lemma POS translations mean-relatedness-score frequency-GEN frequency-SPEC.

The 'mean relatedness score' denotes the annotation-based measure of semantic shift described in the paper. 'frequency GEN' and 'frequency SPEC' list the frequencies of the target words in the general-language corpus (GEN) and the domain-specific cooking corpus (SPEC). 'translations' provides English translations across senses, illustrating possible meaning shifts. Note that further senses might exist;

  • full annotation tables from the annotators; they come in the form of a tab-separated csv file where each line has the form

sentence-1 rating comment sentence-2.

  • the annotation guidelines in English and German
  • data visualization plots.

Find more information in the papers referenced below.

References

Anna Hätty, Dominik Schlechtweg, Sabine Schulte im Walde. 2019. SURel: A Gold Standard for Incorporating Meaning Shifts into Term Extraction. In Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM). Minneapolis, Minnesota, USA, 2019.

Dominik Schlechtweg, Sabine Schulte im Walde, Stefanie Eckmann. 2018. Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). New Orleans, Louisiana, USA, 2018.

Download

The resources are freely available for education, research and other non-commercial purposes. More information can be requested via email to the authors.

Related Resources

  • WOCC: corpora from which the uses for annotation were sampled.
  • DURel: parallely annotated diachronic data set.
  • Metaphoric Change: similarly annotated diachronic data set for metaphoric change.
Sabine Schulte im Walde
Apl. Prof. Dr.

Sabine Schulte im Walde

Akademische Rätin (Associate Professor)

To the top of the page