|
The TIGER Treebank (Version 2.1) consists of app. 900,000 tokens (50,000 sentences) of German newspaper text, taken from the Frankfurter Rundschau. The corpus was semi-automatically POS-tagged and annotated with syntactic structure. Moreover, it contains morphological and lemma information for terminal nodes. For details, see the annotation
page.
The TIGER Treebank is delivered in two treebank formats:
Both versions of the corpus can be processed by the treebank query tool TIGERSearch, which has also been developed within the TIGER project.
Version 1 of the TIGER Treebank is still available as well. It consists of app. 700,000 tokens (40,000
sentences). With respect to version 2.1 (and version 2), it lacks the morphological and lemma information.
In addition to the TIGER Treebank proper, several resources derived from it are available. These are the TiGer Dependency Bank, which is a dependency-based gold standard for (hand-crafted) German parsers for the TIGER Corpus sentences 8,001 through 10,000, the TIGER 700 RMRS Bank, the TIGER data sets for the CoNLL-X shared task and dependency triple representations for (almost) the entire treebank, which, like the TiGer DB structures, are intended for evaluation purposes.
If you are interested in using the corpus, please refer to the license page in order to see which of our license agreements applies to you.
|