CoInCo: Concepts in Context

Typ Corpus
Titel CoInCo: Concepts in Context
Autor Sebastian Pado


CoInCo (Concepts in Context) is an English corpus that adds add-words lexical substitution annotation to a sample of the newswire and fiction genres of the freely available MASC corpus ( It covers about 35.000 tokens of running text. For all 15.500 content words we elicited at least 6 context-appropriate substitutes by crodsourcing. Informants had access to the full sentence and two sentences of discourse context.

The CoInCo download includes a sample of the MASC corpus, which is available under the CC-BY-3.0-US license. The lexical substitution annotations that we added for CoInCo are published under the same CC-BY-3.0-US license. Find the full text of the license here:


Gerhard Kremer, Katrin Erk, Sebastian Pado, Stefan Thater: What Substitutes Tell Us – Analysis of an “All-Words” Lexical Substitution Corpus. Proceedings of EACL 2014. Göteborg, Schweden.


Data (.xml.gz)

README (.txt)

Development Set IDs

Test Set IDs