Europarl Nominal Compoundhood Ratings
- Type
-
Corpus
- Author
-
Patrick Ziering
- Description
-
The Europarl Nominal Compoundhood Ratings (ENCR) is a selection of 394 sentences from the English portion of the Europarl corpus (Europarl v7, OPUS (Tiedemann, 2012)), annotated with 824 candidate compounds.
Each compound token is associated with a rating (1, 2 or 3) for the degree of compoundhood and for the validity of six linguistic criteria, described below.
The compoundhood rating:[3] very compoundlike (i.e., a prototypical compound )
[2] rather compoundlike (i.e., probably a compound )
[1] mildly compoundlike (i.e., could be considered as a compound )The six linguistic criteria:
- Spelling:
Does the spelling of the expression under consideration (i.e., closed or open compounding) point to compoundhood? - Inseparability:
No element should intervene a compound’s constituents. While 'black bird' can be understood as a compound, 'black ugly bird' is a phrase. Can you think of a way to insert an element between the constituents of the underlying expression? - Inability to modify the modifier:
Is there a modifying adjective/adverb or can you think of such an element in the surrounding context that modifies any modifier in the expression under consideration? - Inability to replace the head by the pronoun 'one':
Can you replace the head of the expression under consideration by the pronoun 'one'? - Inflection of the modifier:
Is any modifier inflected (wrt. regular word inflection) in the expression under consideration? - Prosody:
While in a phrase such as 'black bird', the head (i.e., 'bird' ) is stressed (or
both parts have equal stress), in a compound such as 'blackbird' the primary stress is commonly on the modifier (i.e., 'black' ). How would you stress the expression under consideration? - File format: Each line corresponds to one candidate compound. Each line contains 12 tab-spaced fields:
internal ID <tab> candidate compound <tab> rating for compoundhood <tab> rating for all six linguistic criteria <tab> the underlying Europarl sentence <tab> the underlying Europarl sentence with the highlighted candidate compound <tab> the ID of the annotator (1 or 2)
Download: ENCR.tar.gz
Keywords: noun compound, compound noun, multi-word expression, database, list, resource, dataset, rating, ratings, criterion, criteria, linguistic criterion, linguistic criteria
- Spelling:
- Reference
-
Jörg Tiedemann
Parallel data, tools and interfaces in OPUS.
Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC), 2012.
General Contact IMS
Pfaffenwaldring 5 b, 70569 Stuttgart
Webmaster of the IMS
- Write e-mail
- If you have any problems with the website, please directly contact the webmaster.