Institut

Studium

Forschung


 

SciCorp

Typ Corpus
Titel SciCorp
Autor Ina Rösiger

Beschreibung

SciCorp is a corpus of full-text English scientific papers of two disciplines, genetics and computational linguistics. The corpus comprises coreference and bridging information as well as information status labels.

The corpus has been reliably annotated by independent human coders with moderate inter-annotator agreement (average kappa=0.71). In total, we have annotated 14 full papers containing 61,045 tokens, and marked about 8700 definite noun phrases.

The corpus is available for download in two different formats: in an offset-based format and, for the coreference annotations, in the widely-used, tabular CoNLL-2012 format. 


Referenz

Ina Rösiger (2016)
SciCorp: A Corpus of English Scientific Articles Annotated for Information-Structural Analysis
Proceedings of LREC. Portorož, Slovenia 2016.


Download

The SciCorp corpus from Rösiger (2016) can be downloaded here.
The annotation guidelines can be downloaded here.