2.5 Third party corpus samplers

A range of institutions have kindly agreed that excerpts from their text corpora may be distributed with TIGERSearch. The current version of TIGERSearch includes the following corpus samplers (in alphabetic order):

Chinese

Chinese Treebank sampler

105 corpus graphs, University of Pennsylvania, distributed by LDC

English

Penn Treebank: Brown Corpus and Switchboard Corpus samplers

200 sentences each, University of Pennsylvania, distributed by LDC

Penn-Helsinki Parsed Corpus of Middle English (PPCME2 Corpus) sampler

200 sentences, University of Pennsylvania / PPCME2 Project

Susanne and Christine Corpus samplers

200 sentences each, Sussex University / Susanne and Christine projects

VerbMobil Corpus sampler

250 sentences, see German VerbMobil sampler

German

DEREKO Corpus sampler

250 sentences, SfS, University of Tübingen and IMS, University of Stuttgart / DEREKO project

IMS chunking and parsing tools

The tools LoPar, TreeTagger, and YAC processed the same technical text (about 250 sentences). IMS, University of Stuttgart

Negra Corpus sampler

250 sentences, Department of Computational Linguistics, Universität des Saarlandes / Negra project

TIGER Corpus sampler

200 sentences, Institut für Germanistik, University of Potsdam / Department of Computational Linguistics, Universität des Saarlandes / IMS, University of Stuttgart / TIGER project

VerbMobil Corpus sampler

250 sentences, SfS, University of Tübingen / VerbMobil Project, distributed by IPSK, Ludwig-Maximilian-Universität München

Japanese

VerbMobil Corpus sampler

250 sentences, see German VerbMobil sampler

Korean

Korean Treebank sampler

125 corpus graphs, University of Pennsylvania, distributed by LDC