HotCoref DE
- Type
-
Tool
- Author
-
Ina Rösiger
- Description
-
Update March 2017:
The update contains
- an improved version of the coreference resolver Download link
- a conversion tool for linux that converts plain texts into the required input format CoNLL-12 (using the same tools that were used during training) Link to GitHub
The German coreference system described in the LREC 2016 paper [1] and in the ACL paper [2] can be downloaded below.
The download includes
- a manual on how to run the resolver
- default feature lists as well as an overview of the features that one can play around with
- example documents in CoNLL-12 format, including pos tags, parse bits, lemmata, morphological information and named entities (optional)
Note: the resolver is among other things based on the extraction of NPs from the parse bits. Some parsers for German do not annotate NPs inside PPs (=they are flat), so you need to insert them before running the tool.
Here's a manual on how to run the resolver
Pre-trained models
New model:
- new model trained on the completeTüBa-D/Z version 10 data using regular processing with the improved version of the coreference resolver available here
Older models: (trained with LREC version)
- trained on the complete TüBa-D/Z version 10 data, gold processing available here
- trained on the complete TüBa-D/Z version 10 data, regular processing available here
- trained on the complete TüBa-D/Z, version 9, regular processing is available here
An older version of the tool (as published in the LREC 2016 paper [1]) can be downloaded here.
The version published in ACL 2015 [2]) can be found here
CoNLL scores as published in [1]:
- 65.76 (no singletons) on the TüBa-D/Z test set version 10, using gold annotations
- 48.54 (including singletons) on the TüBa-D/Z test set version 10, using regular annotations
The older performance (as reported in the paper [2], using real preprocessing/predicted annotations only and no gold mention boundary (GB) information) is as follows:
- 51.61 (no singletons) on the TüBa-D/Z test set version 9
- 60.35 (including singletons) and 48.61 (without) on the TüBa-D/Z test set version 8 (=SemEval dataset) (in CoNLL score)
This version of the system is licensed under the GNU General Public License. For questions contact Anders Björkelund (firstname@ims.uni-stuttgart.de).
- Reference
-
[1] Ina Rösiger and Jonas Kuhn
IMS HotCoref De: A data-driven co-reference resolver for German
Proceedings of LREC 2016, Portorož, Slovenia 2016.[2] Ina Rösiger and Arndt Riester
Using prosodic annotations to improve coreference resolution of spoken text.
Proceedings of ACL-IJCNLP 2015, Beijing, China. - Download
-
- Download the German co-reference system and the manual how to run the resolver
- Download the older German co-reference system as published in the LREC 2016 paper [1] amd the manual how to run the resolver
- Download the older German prototype as published in ACL 2015 [2] and the manual how to run the resolver
General Contact IMS
Pfaffenwaldring 5 b, 70569 Stuttgart
Webmaster of the IMS
- Write e-mail
- If you have any problems with the website, please directly contact the webmaster.