Automatically detected non-recorded word senses in English and Swedish (NRS EN/SV)

This data collection contains English and Swedish word sense annotations from which non-recorded senses can be derived.

NRS EN/SV: Automatically detected non-recorded word senses in English and Swedish

Type

Data

Author

Jonathan Lautenschlager, Emma Sköldberg, Simon Hengchen, Dominik Schlechtweg

Description

This data collection contains English and Swedish use-sense instances annotated with binary labels. Annotators were asked to judge whether the respective sense (gloss) describes the meaning of the target word in the respective use well. We provide the following files:

  • data/: uses, senses, instances and judgments for randomly sampled uses (phase 1) and for uses predicted to be missing from the respective dictionary (phase 2). Instances for phase 2 are missing but can be easily reconstructed by combining each use with each sense of the lemma for that use. We further provide assigned and unassigned usages aggregated over the three annotators as described in the paper below. The tutorial used for training annotators is available in the annotation_standardization repository.
  • guidelines/: the guidelines used for annotator training.

Please find more information including limitations on the data in the paper referenced below.

Version: 1.0.0, 27.02.2024.

Reference

Jonathan Lautenschlager, Emma Sköldberg, Simon Hengchen, Dominik Schlechtweg. 2024. Detection of non-recorded word senses in English and Swedish.

Download

The resource is available per download.

Dominik Schlechtweg

Dr.

Employee

To the top of the page