Institut

Studium

Forschung


 

Europarl Nominal Compound Database

Typ Corpus
Titel Europarl Nominal Compound Database
Autor Patrick Ziering

Beschreibung

The Europarl Nominal Compound Database (ENCD) was automatically extracted from Europarl v7 of OPUS (Tiedemann, 2012 [2]).

This database contains English nominal compounds and their equivalents in up to nine languages:

  • Danish
  • Dutch
  • English (pivot)
  • French
  • German
  • Greek
  • Italian
  • Portuguese
  • Romanian
  • Spanish
  • Swedish

We provide several versions of the database (ranging from optimal recall (CCR0) to optimal precision (CCR4)).

Keywords: noun compound, compound noun, multilingual, cross-lingual, multi-word expression, database, list, resource, dataset


Referenz

[1] Patrick Ziering and Lonneke van der Plas
What good are 'Nominalkomposita' for 'noun compounds':
Multilingual Extraction and Structure Analysis of Nominal Compositions using Linguistic Restrictors

Proceedings of the 25th International Conference on Computational Linguistics (COLING), 2014.

[2] Jörg Tiedemann
Parallel data, tools and interfaces in OPUS.
Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC), 2012.


Download