BASHI

BASHI is a corpus consisting of 50 Wall Street Journal (WSJ) articles which adds bridging anaphors and their antecedents to the other gold annotations that have been created as part of the OntoNotes project. Bridging anaphors are context-dependent expressions that do not refer to the same entity as their antecedent, but to a related entity

BASHI

Typ
Corpus
Autor
Ina Rösiger
Beschreibung

BASHI is a corpus consisting of 50 Wall Street Journal (WSJ) articles which adds bridging anaphors and their antecedents to the other gold annotations that have been created as part of the OntoNotes project. Bridging anaphors are context-dependent expressions that do not refer to the same entity as their antecedent, but to a related entity.

The corpus contains 57,709 tokens and 410 bridging pairs and is available for download in an offset-based format and a CoNLL-12 style bridging column that can be merged with the other annotation layers in OntoNotes.

The corpus will be made available for download in two different formats: in an offset-based format and in the widely-used, tabular CoNLL-2012 format. 

Referenz

Ina Rösiger (2018)
BASHI: A corpus of Wall Street Journal articles annotated with bridging links
Proceedings of LREC. Miyazaki, Japan 2018.

Download

The corpus can be downloaded here:

  • conll format that needs to be merged with the other OntoNotes annotations (LDC)
  • last three columns: bridging (general), indefinite, comparative

The annotation guidelines can be downloaded here.

 

Kontakt IMS

Pfaffenwaldring 5 b, 70569 Stuttgart

 

Webmaster des IMS

Zum Seitenanfang