Fine-grained Compound Termhood Annotation Dataset

This is the dataset for the cited paper 'Fine-Grained Termhood Prediction for German Compound Terms Using Neural Networks'

Fine-grained Compound Termhood Annotation Dataset

Typ
ExperimentData
Autor
Anna Hätty, Sabine Schulte im Walde

We consider term difficulty as part of a tier model for a term's strength of association to a domain. It should naturally align to the idea of a gradual increase of term specificity to the domain: The more difficult or specialised a term is, the more distinctive it is from general language and the more it is associated to a domain. If terms are both general and understandable, it is sometimes hard to distinguish them from general-language words. Thus, the more expert knowledge is needed to understand a term, the stronger it should be associated to a domain. In this dataset, we distinguish four tiers, according to which five human judges annotated 396 German compounds from the cooking domain

Referenz

Anna Hätty, Sabine Schulte im Walde (2018)
Fine-grained Termhood Prediction for German Compound Terms using Neural Networks
In: Proceedings of the COLING Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG). Santa Fe, NM, USA.

Download

Please contact the SemRel group to obtain the data.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA).
Creative Commons License

SemRel

Logo der Forschergruppe SemRel
 

Forschergruppe SemRel

Dieses Bild zeigt Sabine Schulte im Walde

Sabine Schulte im Walde

Prof. Dr.

Akademische Rätin

Zum Seitenanfang