Fine-grained Compound Termhood Annotation Dataset

Typ ExperimentData
Titel Fine-grained Compound Termhood Annotation Dataset
Autor Anna Hätty, Sabine Schulte im Walde


We consider term difficulty as part of a tier model for a term's strength of association to a domain. It should naturally align to the idea of a gradual increase of term specificity to the domain: The more difficult or specialised a term is, the more distinctive it is from general language and the more it is associated to a domain. If terms are both general and understandable, it is sometimes hard to distinguish them from general-language words. Thus, the more expert knowledge is needed to understand a term, the stronger it should be associated to a domain. In this dataset, we distinguish four tiers, according to which five human judges annotated 396 German compounds from the cooking domain


Anna Hätty, Sabine Schulte im Walde (2018)
Fine-grained Termhood Prediction for German Compound Terms using Neural Networks
In: Proceedings of the COLING Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG). Santa Fe, NM, USA.


This work is licensed under a Creative Commons Attribution 4.0 International License. Please contact Anna Hätty or Sabine Schulte im Walde to obtain the data.