SciCorp

Corpus of full-text English scientific papers of genetics and computational linguistics

SciCorp

Typ
Corpus
Autor
Ina Rösiger
Beschreibung

SciCorp is a corpus of full-text English scientific papers of two disciplines, genetics and computational linguistics. The corpus comprises coreference and bridging information as well as information status labels.

The corpus has been reliably annotated by independent human coders with moderate inter-annotator agreement (average kappa=0.71). In total, we have annotated 14 full papers containing 61,045 tokens, and marked about 8700 definite noun phrases.

The corpus is available for download in two different formats: in an offset-based format and, for the coreference annotations, in the widely-used, tabular CoNLL-2012 format.

Referenz

Ina Rösiger (2016)
SciCorp: A Corpus of English Scientific Articles Annotated for Information-Structural Analysis
Proceedings of LREC. Portorož, Slovenia 2016.

Download
 

Kontakt IMS

Pfaffenwaldring 5 b, 70569 Stuttgart

 

Webmaster des IMS

Zum Seitenanfang