Europarl Nominal Compound Database

The Europarl Nominal Compound Database (ENCD) was automatically extracted from Europarl v7 of OPUS. This database contains English nominal compounds and their equivalents in up to nine languages

Europarl Nominal Compound Database

Type
Corpus
Author
Patrick Ziering
Description

The Europarl Nominal Compound Database (ENCD) was automatically extracted from Europarl v7 of OPUS (Tiedemann, 2012 [2]).

This database contains English nominal compounds and their equivalents in up to nine languages:

  • Danish
  • Dutch
  • English (pivot)
  • French
  • German
  • Greek
  • Italian
  • Portuguese
  • Romanian
  • Spanish
  • Swedish

We provide several versions of the database (ranging from optimal recall (CCR0) to optimal precision (CCR4)).

Keywords: noun compound, compound noun, multilingual, cross-lingual, multi-word expression, database, list, resource, dataset

Reference

[1] Patrick Ziering and Lonneke van der Plas
What good are 'Nominalkomposita' for 'noun compounds':
Multilingual Extraction and Structure Analysis of Nominal Compositions using Linguistic Restrictors

Proceedings of the 25th International Conference on Computational Linguistics (COLING), 2014.

[2] Jörg Tiedemann
Parallel data, tools and interfaces in OPUS.
Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC), 2012.

Download
 

General Contact IMS

Pfaffenwaldring 5 b, 70569 Stuttgart

 

Webmaster of the IMS

  • Write e-mail
  • If you have any problems with the website, please directly contact the webmaster.
To the top of the page