Fine-grained Compound Termhood Annotation Dataset

This is the dataset for the cited paper 'Fine-Grained Termhood Prediction for German Compound Terms Using Neural Networks'

Fine-grained Compound Termhood Annotation Dataset

Typ
ExperimentData
Autor
Anna Hätty, Sabine Schulte im Walde

We consider term difficulty as part of a tier model for a term's strength of association to a domain. It should naturally align to the idea of a gradual increase of term specificity to the domain: The more difficult or specialised a term is, the more distinctive it is from general language and the more it is associated to a domain. If terms are both general and understandable, it is sometimes hard to distinguish them from general-language words. Thus, the more expert knowledge is needed to understand a term, the stronger it should be associated to a domain. In this dataset, we distinguish four tiers, according to which five human judges annotated 396 German compounds from the cooking domain

Referenz

Anna Hätty, Sabine Schulte im Walde (2018)
Fine-grained Termhood Prediction for German Compound Terms using Neural Networks
In: Proceedings of the COLING Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG). Santa Fe, NM, USA.

Download

This work is licensed under a Creative Commons Attribution 4.0 International License. Please contact Anna Hätty or Sabine Schulte im Walde to obtain the data.

Sabine Schulte im Walde
Apl. Prof. Dr.

Sabine Schulte im Walde

Akademische Rätin

Zum Seitenanfang