Subsection: Corpus administration

1.1 Corpus administration

Treebanks to be processed by the TIGERSearch search engine have to be converted into a binary representation first - the so-called index. This index-based approach splits working with the TIGERSearch software suite into two parts: The first part deals with corpus indexing (TIGERRegistry, described in the present chapter), the second part with corpus query processing (TIGERSearch, described in chapter IV). The TIGERSearch corpus query processor can only process corpora that have already been indexed.

The indexed corpora are organized in a hierarchical file system: Each corpus is stored in a folder (i.e. in a directory of your local file system), and corpus folders can be grouped in a common folder as well. The following example illustrates the physical content of a corpus directory and its graphical tree representation. Corpus folders are represented as folder icons and corpora are represented as book icons.

Figure: Graphical tree representation of the corpus directory

DemoCorpora/
DemoCorpora/Chinese/
DemoCorpora/Chinese/CHINESETreebankSampler
DemoCorpora/English/
DemoCorpora/English/BROWNSampler
DemoCorpora/English/CHRISSampler
DemoCorpora/English/PPCME2Sampler
DemoCorpora/English/SUESampler
DemoCorpora/English/SWITCHBOARDSampler
DemoCorpora/German/
DemoCorpora/German/DEREKOSampler
DemoCorpora/German/IMS/
DemoCorpora/German/IMS/LoParSampler
DemoCorpora/German/IMS/TreeTaggerSampler
DemoCorpora/German/IMS/YACSampler
DemoCorpora/German/NEGRASampler
DemoCorpora/German/TIGERSampler
DemoCorpora/Korean/
DemoCorpora/Korean/KOREANTreebankSampler
DemoCorpora/Projects/
DemoCorpora/Projects/VerbMobil/
DemoCorpora/Projects/VerbMobil/VMSampler-DE
DemoCorpora/Projects/VerbMobil/VMSampler-EN
DemoCorpora/Projects/VerbMobil/VMSampler-JAP