- Wolfgang Seeker
Tiger2Dep is a python program that converts the TiGer corpus (version 2.1) to dependency format.
The conversion proceeds in two steps. In the first step, a list of error corrections is applied to the source corpus. The second step then converts the corrected corpus to dependency format.
The script can output standard dependency trees as used e.g. in the CoNLL Shared Task 2009, or it can output dependency trees with empty nodes for missing verbs. The empty nodes are inserted automatically wherever a VP
in the source corpus did not have a head.
The error correction step can also be applied on its own in order to produce a new version of TiGer (version 2.2). This version can be downloaded from the TiGer website.
Update May 27, 2014
Version 1.2 of the script now also converts the German data sets of the SMULTRON corpus and some manually annotated EuroParl sentences by Sebastian Padó to dependencies. I tried to make the conversion as similar as possible to the one for TiGer, so all five data sets can be used as out-of-domain test sets for tools trained on TiGer.
Update May, 2020
Version 1.3 of the script converts the six radio interviews from the GRAIN-S corpus.
An earlier version of version 1.1 (then implemented in Prolog) is described in
Making Ellipses Explicit in Dependency Conversion for a German Treebank,,
Proceedings of the 8th International Conference on Language Resources and Evaluation, p. 3132–3139, 2012, Istanbul, Turkey.
Version 1.2 is described in
Wolfgang Seeker and Jonas Kuhn, An Out-of-Domain Test Suite for Dependency Parsing of German, Proceedings of the 9th International Conference on Language Resources and Evaluation, p. 4066–4073, 2014, Reykjavik, Iceland.
Version 1.3 is described in
Agnieszka Falenska, Zoltán Czesznak, Kerstin Jung, Moritz Völkel, Wolfgang Seeker, and Jonas Kuhn. GRAIN-S: manually annotated syntax for German interviews. Proceedings of The 12th Language Resources and Evaluation Conference.
Download version 1.3: tiger2dep.v1.3.tar.gz
You can also download an already converted version of the GRAIN-S files from the corpus website.
Download version 1.2: tiger2dep.v1.2.tar.gz
You can also download an already converted version of the SMULTRON files. The conversion was done with the default settings (coordinations as chains, easy punctuation attachment). Credit for the original annotations goes to their respective annotators. [smultron-dependencies.tar.gz]
Download version 1.1: tiger2dep-v1.1.tar.gz
You can download a (slightly outdated) fully converted version of the TiGer corpus
as well as a error-corrected phrase-structure version from the TiGer website.
I would like to thank Giuseppe Attardi for pointing out some issues in an earlier version of this script.