2. The TIGERSearch software tools

Searching on syntactically annotated corpora is much more time-consuming than searching on corpora which are only annotated on the word level (word, lemma, part-of-speech, morphological description etc.). Thus, an index structure which enables fast access to the results of many partial searches can greatly improve efficiency. For this reason we have decided to use an index-based architecture for the design of the TIGERSearch software suite.

Thus, the TIGERSearch software suite is divided into two tools:

TIGERSearch

The TIGERSearch tool is the corpus query processor. You can process queries, view the results, and export your favourite matches. The TIGERSearch query language is introduced in chapter III. The TIGERSearch tool is described in chapter IV.

TIGERRegistry

The import and preprocessing of corpora is realized in a tool called TIGERRegistry. For the import of treebanks, we have developed the TIGER-XML corpus encoding format. New corpora have to be converted into this format first. To support as many existing treebanks as possible, TIGERRegistry also comprises import filters (i.e. converters to TIGER-XML) for many popular treebank formats. The TIGER-XML corpus encoding format and the implemented import filters are described in chapter V and subsection 3.5, chapter VI, respectively. The TIGERRegistry tool is described in chapter VI.