The Europarl Nominal Compoundhood Ratings (ENCR) is a selection of 394 sentences from the English portion of the Europarl corpus (Europarl v7, OPUS (Tiedemann, 2012)), annotated with 824 candidate compounds.

Each compound token is associated with a rating (1, 2 or 3) for the degree of compoundhood and for the validity of six linguistic criteria, described below.

The compoundhood rating:

[3] very compoundlike     (i.e., a prototypical compound )
[2] rather compoundlike    (i.e., probably a compound )
[1] mildly compoundlike   (i.e., could be considered as a compound )

The six linguistic criteria:

  • Spelling:
    Does the spelling of the expression under consideration (i.e., closed or open compounding) point to compoundhood?

  • Inseparability:
    No element should intervene a compound’s constituents. While 'black bird' can be understood as a compound, 'black ugly bird' is a phrase. Can you think of a way to insert an element between the constituents of the underlying expression?

  • Inability to modify the modifier:
    Is there a modifying adjective/adverb or can you think of such an element in the surrounding context that modifies any modifier in the expression under consideration?

  • Inability to replace the head by the pronoun 'one':
    Can you replace the head of the expression under consideration by the pronoun 'one'?

  • Inflection of the modifier:
    Is any modifier inflected (wrt. regular word inflection) in the expression under consideration?

  • Prosody:
    While in a phrase such as 'black bird', the head (i.e., 'bird' ) is stressed (or
    both parts have equal stress), in a compound such as 'blackbird' the primary stress
    is commonly on the modifier (i.e., 'black' ). How would you stress the expression under consideration?

File format:
Each line corresponds to one candidate compound. Each line contains 12 tab-spaced fields:
     internal ID <tab> candidate compound <tab> rating for compoundhood <tab> rating for all six linguistic criteria <tab> the underlying Europarl sentence <tab> the underlying Europarl sentence with the highlighted candidate compound <tab> the ID of the annotator (1 or 2)

Download: ENCR.tar.gz

