Fine-grained Compound Termhood Annotation Dataset
Anna Hätty, Sabine Schulte im Walde
We consider term difficulty as part of a tier model for a term's strength of association to a domain. It should naturally align to the idea of a gradual increase of term specificity to the domain: The more difficult or specialised a term is, the more distinctive it is from general language and the more it is associated to a domain. If terms are both general and understandable, it is sometimes hard to distinguish them from general-language words. Thus, the more expert knowledge is needed to understand a term, the stronger it should be associated to a domain. In this dataset, we distinguish four tiers, according to which five human judges annotated 396 German compounds from the cooking domain
Anna Hätty, Sabine Schulte im Walde (2018)
Fine-grained Termhood Prediction for German Compound Terms using Neural Networks
In: Proceedings of the COLING Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG). Santa Fe, NM, USA.
Please contact the SemRel group to obtain the data.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA).