Domain-Specific Dataset of Difficulty Ratings for German Noun Compounds in the Domains DIY, Cooking and Automotive
Julia Bettinger, Anna Hätty, Michael Dorna, Sabine Schulte im Walde
The dataset contains difficulty ratings for 1,030 German closed noun compounds extracted from domain-specific texts for do-it-yourself (DIY), cooking and automotive. It includes two-part compounds for cooking and DIY, and two- to four-part compounds for automotive. The compounds were identified in text using the Simple Compound Splitter (Weller-Di Marco, 2017); a subset was filtered and balanced for frequency and productivity criteria as basis for manual annotation and fine-grained interpretation. The final dataset was annotated with ratings from 20 annotators.
The dataset is available under the Creative Commons Share Alike public license.