Europarl Nominal Compound Database

Typ Corpus
Titel Europarl Nominal Compound Database
Autor Patrick Ziering


The Europarl Nominal Compound Database (ENCD) was automatically extracted from Europarl v7 of OPUS (Tiedemann, 2012 [2]).

This database contains English nominal compounds and their equivalents in up to nine languages:

  • Danish
  • Dutch
  • English (pivot)
  • French
  • German
  • Greek
  • Italian
  • Portuguese
  • Romanian
  • Spanish
  • Swedish

We provide several versions of the database (ranging from optimal recall (CCR0) to optimal precision (CCR4)).

Keywords: noun compound, compound noun, multilingual, cross-lingual, multi-word expression, database, list, resource, dataset


[1] Patrick Ziering and Lonneke van der Plas
What good are 'Nominalkomposita' for 'noun compounds':
Multilingual Extraction and Structure Analysis of Nominal Compositions using Linguistic Restrictors

Proceedings of the 25th International Conference on Computational Linguistics (COLING), 2014.

[2] Jörg Tiedemann
Parallel data, tools and interfaces in OPUS.
Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC), 2012.