Europarl Nominal Compound Database

Typ Corpus
Titel Europarl Nominal Compound Database
Autor Patrick Ziering


The Europarl Nominal Compound Database (ENCD) was automatically extracted from Europarl v7 of OPUS (

This database contains English nominal compounds and their equivalents in up to nine languages:

  • Danish
  • Dutch
  • English (pivot)
  • French
  • German
  • Greek
  • Italian
  • Portuguese
  • Romanian
  • Spanish
  • Swedish

We provide several versions of the database (ranging from optimal recall to optimal precision) and with different language subsets.

We are currently working on a web publication. Please contact Patrick Ziering ( for more details.

Keywords: noun compound, compound noun, multilingual, cross-lingual, multi-word expression, database, list, resource, dataset


Patrick Ziering and Lonneke van der Plas
What good are 'Nominalkomposita' for 'noun compounds':
Multilingual Extraction and Structure Analysis of Nominal Compositions using Linguistic Restrictors

Proceedings of the 25th International Conference on Computational Linguistics (COLING), 2014.

Jörg Tiedemann
Parallel data, tools and interfaces in OPUS.
Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC), 2012.