CoInCo: Concepts in Context
- Sebastian Pado
CoInCo (Concepts in Context) is an English corpus that adds add-words lexical substitution annotation to a sample of the newswire and fiction genres of the freely available MASC corpus (http://www.anc.org/data/masc/). It covers about 35.000 tokens of running text. For all 15.500 content words we elicited at least 6 context-appropriate substitutes by crodsourcing. Informants had access to the full sentence and two sentences of discourse context.
The CoInCo download includes a sample of the MASC corpus, which is available under the CC-BY-3.0-US license. The lexical substitution annotations that we added for CoInCo are published under the same CC-BY-3.0-US license. Find the full text of the license here: https://creativecommons.org/licenses/by/3.0/us/
Gerhard Kremer, Katrin Erk, Sebastian Pado, Stefan Thater: What Substitutes Tell Us – Analysis of an “All-Words” Lexical Substitution Corpus. Proceedings of EACL 2014. Göteborg, Schweden. http://www.aclweb.org/anthology/E14-1057.pdf