- Ina Rösiger
BASHI is a corpus consisting of 50 Wall Street Journal (WSJ) articles which adds bridging anaphors and their antecedents to the other gold annotations that have been created as part of the OntoNotes project. Bridging anaphors are context-dependent expressions that do not refer to the same entity as their antecedent, but to a related entity.
The corpus contains 57,709 tokens and 410 bridging pairs and is available for download in an offset-based format and a CoNLL-12 style bridging column that can be merged with the other annotation layers in OntoNotes.
The corpus will be made available for download in two different formats: in an offset-based format and in the widely-used, tabular CoNLL-2012 format.
Ina Rösiger (2018)
BASHI: A corpus of Wall Street Journal articles annotated with bridging links
Proceedings of LREC. Miyazaki, Japan 2018.
The corpus can be downloaded here:
- conll format that needs to be merged with the other OntoNotes annotations (LDC)
- last three columns: bridging (general), indefinite, comparative
The annotation guidelines can be downloaded here.