HotCoref DE

The German coreference system described in the LREC 2016 paper and in the ACL paper

HotCoref DE

Typ
Tool
Autor
Ina Rösiger
Beschreibung

Update March 2017:

The update contains

  • an improved version of the coreference resolver Download link
  • a conversion tool for linux that converts plain texts into the required input format CoNLL-12 (using the same tools that were used during training) Link to GitHub

The German coreference system described in the LREC 2016 paper [1] and in the ACL paper [2] can be downloaded below.

The download includes

  • a manual on how to run the resolver
  • default feature lists as well as an overview of the features that one can play around with
  • example documents in CoNLL-12 format, including pos tags, parse bits, lemmata, morphological information and named entities (optional)

Note: the resolver is among other things based on the extraction of NPs from the parse bits. Some parsers for German do not annotate NPs inside PPs (=they are flat), so you need to insert them before running the tool.

Here's a manual on how to run the resolver

Pre-trained models

New model:

  • new model trained on the completeTüBa-D/Z version 10 data using regular processing with the improved version of the coreference resolver available here

Older models: (trained with LREC version)

  • trained on the complete TüBa-D/Z version 10 data, gold processing available here
  • trained on the complete TüBa-D/Z version 10 data, regular processing available here
  • trained on the complete TüBa-D/Z, version 9, regular processing is available here

An older version of the tool (as published in the LREC 2016 paper [1]) can be downloaded here.
The version published in ACL 2015 [2]) can be found here


CoNLL scores as published in [1]:

  • 65.76 (no singletons) on the TüBa-D/Z test set version 10, using gold annotations
  • 48.54 (including singletons) on the TüBa-D/Z test set version 10, using regular annotations

The older performance (as reported in the paper [2], using real preprocessing/predicted annotations only and no gold mention boundary (GB) information) is as follows:

  • 51.61 (no singletons) on the TüBa-D/Z test set version 9
  • 60.35 (including singletons) and 48.61 (without) on the TüBa-D/Z test set version 8 (=SemEval dataset) (in CoNLL score)

This version of the system is licensed under the GNU General Public License. For questions contact Anders Björkelund (firstname@ims.uni-stuttgart.de).

Referenz

[1]  Ina Rösiger and Jonas Kuhn
IMS HotCoref De: A data-driven co-reference resolver for German
Proceedings of LREC 2016, Portorož, Slovenia 2016.

[2]  Ina Rösiger and Arndt Riester
Using prosodic annotations to improve coreference resolution of spoken text.
Proceedings of ACL-IJCNLP 2015, Beijing, China.

Download
 

Kontakt IMS

Pfaffenwaldring 5 b, 70569 Stuttgart

 

Webmaster des IMS

Zum Seitenanfang