Synchronic Usage Relatedness (SURel) - Test Set and Annotation Data
Anna Hätty, Dominik Schlechtweg, Sabine Schulte im Walde
This data collection includes:
- semantic meaning shift test set with 22 German lexemes exhibiting different degrees of meaning shifts from general language to the domain of cooking. It comes as a tab-separated csv file where each line has the form
lemma POS translations mean-relatedness-score frequency-GEN frequency-SPEC.
The 'mean relatedness score' denotes the annotation-based measure of semantic shift described in the paper. 'frequency GEN' and 'frequency SPEC' list the frequencies of the target words in the general-language corpus (GEN) and the domain-specific cooking corpus (SPEC). 'translations' provides English translations across senses, illustrating possible meaning shifts. Note that further senses might exist;
- full annotation tables from the annotators; they come in the form of a tab-separated csv file where each line has the form
sentence-1 rating comment sentence-2.
- the annotation guidelines in English and German
- data visualization plots.
Find more information in the papers referenced below.
Anna Hätty, Dominik Schlechtweg, Sabine Schulte im Walde. 2019. SURel: A Gold Standard for Incorporating Meaning Shifts into Term Extraction. In Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM). Minneapolis, Minnesota, USA, 2019.
Dominik Schlechtweg, Sabine Schulte im Walde, Stefanie Eckmann. 2018. Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). New Orleans, Louisiana, USA, 2018.
The resources are freely available for education, research and other non-commercial purposes. More information can be requested via email to the authors.