CoInCo: Concepts in Context

An English corpus that adds add-words lexical substitution annotation to a sample of the newswire and fiction genres of the freely available MASC corpus

CoInCo - Concepts in Context

Type
Corpus
Author
Sebastian Pado
Description

CoInCo (Concepts in Context) is a relatively large English all-words lexical substitution corpus built on the basis of the newswire and fiction genres of the freely available MASC corpus. It covers some 35K tokens of running text in which all 15.5K content words were labaled with at least 6 Synonyms using crowdsourcing methods. Annotators were able to see the whole sentence as well as two sentences of discourse context.

Reference

Gerhard Kremer, Katrin Erk,Sebastian Pado, Stefan Thater: What Substitutes Tell Us – Analysis of an “All-Words” Lexical Substitution Corpus. To appear in Proceedings of EACL 2014. Gothenburg, Schweden.

Download

Data set (.xml.gz)

README (.txt)

Development Set IDs

Test Set IDs

 

General Contact IMS

Pfaffenwaldring 5 b, 70569 Stuttgart

 

Webmaster of the IMS

  • Write e-mail
  • If you have any problems with the website, please directly contact the webmaster.
To the top of the page