Institut

Studium

Forschung


 

BASHI

Typ Corpus
Titel BASHI
Autor Ina Rösiger

Beschreibung

BASHI is a corpus consisting of 50 Wall Street Journal (WSJ) articles which adds bridging anaphors and their antecedents to the other gold annotations that have been created as part of the OntoNotes project. Bridging anaphors are context-dependent expressions that do not refer to the same entity as their antecedent, but to a related entity.

The corpus contains 57,709 tokens and 410 bridging pairs and is available for download in an offset-based format and a CoNLL-12 style bridging column that can be merged with the other annotation layers in OntoNotes.

The corpus will be made available for download in two different formats: in an offset-based format and in the widely-used, tabular CoNLL-2012 format. 


Referenz

Ina Rösiger (2018)
BASHI: A corpus of Wall Street Journal articles annotated with bridging links
Proceedings of LREC. Miyazaki, Japan 2018.


Download

The corpus will be made available for download soon.
The annotation guidelines can be downloaded here.