- Wolfgang Seeker
Tiger2Dep is a python program that converts the TiGer corpus (version 2.1)
to dependency format.
The conversion proceeds in two steps. In the first step, a list of error corrections
is applied to the source corpus. The second step then converts the corrected
corpus to dependency format.
The script can output standard dependency trees as used e.g. in the CoNLL
Shared Task 2009, or it can output dependency trees with empty nodes for
missing verbs. The empty nodes are inserted automatically wherever a VP
in the source corpus did not have a head.
The error correction step can also be applied on its own in order to produce
a new version of TiGer (version 2.2). This version can be downloaded from
the TiGer website.
Update May 27, 2014
Version 1.2 of the script now also converts the German data sets of the SMULTRON corpus and some manually annotated EuroParl sentences by Sebastian Padó to dependencies. I tried to make the conversion as similar as possible to the one for TiGer, so all five data sets can be used as out-of-domain test sets for tools trained on TiGer.
An earlier version of version 1.1 (then implemented in Prolog) is described in
Making Ellipses Explicit in Dependency Conversion for a German Treebank,
Proceedings of the 8th International Conference on Language Resources and Evaluation,
p. 3132–3139, 2012, Istanbul, Turkey.
Version 1.2 is described in
Wolfgang Seeker and Jonas Kuhn
An Out-of-Domain Test Suite for Dependency Parsing of German,
Proceedings of the 9th International Conference on Language Resources and Evaluation,
p. 4066–4073, 2014, Reykjavik, Iceland.
Download version 1.2: tiger2dep.v1.2.tar.gz
You can also download an already converted version of the SMULTRON files. The conversion was done with the default settings (coordinations as chains, easy punctuation attachment). Credit for the original annotations goes to their respective annotators.
Download version 1.1: tiger2dep-v1.1.tar.gz
You can download a (slightly outdated) fully converted version of the TiGer corpus
as well as a error-corrected phrase-structure version from the TiGer website.
I would like to thank Giuseppe Attardi for pointing out some issues in an earlier version of this script.