Fine-grained Compound Termhood Annotation Dataset

Fine-grained Termhood Prediction for German Compound Terms using Neural Networks

Fine-grained Compound Termhood Annotation Dataset

Type
ExperimentData
Author
Anna Hätty, Sabine Schulte im Walde

We consider term difficulty as part of a tier model for a term's strength of association to a domain. It should naturally align to the idea of a gradual increase of term specificity to the domain: The more difficult or specialised a term is, the more distinctive it is from general language and the more it is associated to a domain. If terms are both general and understandable, it is sometimes hard to distinguish them from general-language words. Thus, the more expert knowledge is needed to understand a term, the stronger it should be associated to a domain. In this dataset, we distinguish four tiers, according to which five human judges annotated 396 German compounds from the cooking domain

Reference

Anna Hätty, Sabine Schulte im Walde (2018)
Fine-grained Termhood Prediction for German Compound Terms using Neural Networks
In: Proceedings of the COLING Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG). Santa Fe, NM, USA.

Download

Please contact the SemRel group to obtain the data.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA).
Creative Commons License

SemRel

Logo of Research Group SemRel
 

Research Group SemRel

This image shows Sabine Schulte im Walde

Sabine Schulte im Walde

Prof. Dr.

Akademische Rätin (Associate Professor)

To the top of the page