Institut

Studium

Forschung


 

CoInCo: Concepts in Context

Typ Corpus
Titel CoInCo: Concepts in Context
Autor Sebastian Pado

Beschreibung

CoInCo (Concepts in Context) is an English corpus that adds add-words lexical substitution annotation to a sample of the newswire and fiction genres of the freely available MASC corpus (http://www.anc.org/data/masc/). It covers about 35.000 tokens of running text. For all 15.500 content words we elicited at least 6 context-appropriate substitutes by crodsourcing. Informants had access to the full sentence and two sentences of discourse context.

The CoInCo download includes a sample of the MASC corpus, which is available under the CC-BY-3.0-US license. The lexical substitution annotations that we added for CoInCo are published under the same CC-BY-3.0-US license. Find the full text of the license here: https://creativecommons.org/licenses/by/3.0/us/


Referenz

Gerhard Kremer, Katrin Erk, Sebastian Pado, Stefan Thater: What Substitutes Tell Us – Analysis of an “All-Words” Lexical Substitution Corpus. Proceedings of EACL 2014. Göteborg, Schweden. http://www.aclweb.org/anthology/E14-1057.pdf


Download

Data (.xml.gz)

README (.txt)

Development Set IDs

Test Set IDs