Institut

Studium

Forschung


 

HotCoref DE

Typ Tool
Titel HotCoref DE
Autor Ina Rösiger

Beschreibung

The German coreference system described in the ACL paper [1] can be downloaded below.

If you would like to try the system on your own data, you

  • can download the pre-trained model (trained on the completeTüBa-D/Z version 9 data) here
  • need to have your texts in CoNLL-12 format, including pos tags, parse bits, lemmata, morphological information and named entities (optional).

    -- Here's an example document.
    -- I have a script that combines IMS internal tools to do the pre-processing and create the format: write me an email if you need help and I will try to run it on your texts.

    Note:  the resolver is among other things based on the extraction of NPs from the parse bits. Some parsers for German do not annotate NPs inside PPs (=they are flat), so you need to insert them before running the tool.

  • if you have a trained model (or data to train on) and texts in CoNLL12 format, here's a manual on how to run the resolver

 The German system is still considered work in progress and has not been published yet (apart from being featured in the ACL paper).  The current performance (as reported in the paper, using real preprocessing/predicted annotations only and no gold mention boundary (GB) information) is as follows:

  • 51.61 (no singletons) on the TüBa-D/Z test set version 9
  • 60.35 (including singletons) and 48.61 (without) on the TüBa-D/Z test set version 8 (=SemEval dataset)

    (in CoNLL score)

This version of the system is licensed under the GNU General Public License. For questions contact Anders Björkelund (firstname@ims.uni-stuttgart.de).


Referenz

[1]  Ina Rösiger and Arndt Riester
Using prosodic annotations to improve coreference resolution of spoken text.
Proceedings of ACL-IJCNLP 2015, Beijing, China.


Download

  • Download the German prototype here