I. Introduction

1. About this manual

Welcome to the TIGERSearch software suite! We are glad that you have decided to use TIGERSearch for your linguistic research.

What has TIGERSearch been designed for? The answer is simple: The TIGERSearch software let's you explore syntactically annotated text corpora. If you are a grammar engineer who is developing a grammar, you might use TIGERSearch to obtain sample sentences for the syntactic phenomena you are interested in. If you are a lexicographer or terminologist, you can employ TIGERSearch to find out lexical properties of a word like the collocations the word is used in. Generally speaking, the TIGERSearch software can be used to visualize treebanks and to extract information from treebanks.

This manual describes the concept and handling of the TIGERSearch software suite. The manual is delivered in three different versions: The hyperlinked versions (HTML and PDF) can be found in the doc/html/ and doc/pdf/ subdirectories of your TIGERSearch distribution. Another version has been integrated as an help system into the TIGERSearch software. You can activate the help system by selecting one of the items in the Help menu or by clicking the Help button in the main window of both the TIGERSearch or TIGERRegistry tool.

The present version is the hyperlinked HTML version of the manual. Follow the links to find the description you are looking for. Click on a screenshot to display a larger version of an image. To print the manual, we recommend you to use the PDF version.

What are the central chapters of this manual? Users who just want to use the query functionality of the TIGERSearch software suite should have a look at chapter III and chapter IV which describe the TIGERSearch corpus query language and the TIGERSearch query tool. Corpus administrators should also read chapter V and chapter VI which explain the TIGER-XML corpus import/export format and the TIGERRegistry corpus administration tool, respectively.

Users who are interested in the concept, design, and implementation of the TIGERSearch software should also consult Wolfgang Lezius' Ph.D. thesis [Lezius2002] (in German).

Please note: The TIGERSearch manual has been developed using the JManual environment. The JManual project is a joint initiative of the projects NITE and TIGER (cf. http://www.tigersearch.de for further information).

2. The TIGERSearch software tools

Searching on syntactically annotated corpora is much more time-consuming than searching on corpora which are only annotated on the word level (word, lemma, part-of-speech, morphological description etc.). Thus, an index structure which enables fast access to the results of many partial searches can greatly improve efficiency. For this reason we have decided to use an index-based architecture for the design of the TIGERSearch software suite.

Thus, the TIGERSearch software suite is divided into two tools:

TIGERSearch

The TIGERSearch tool is the corpus query processor. You can process queries, view the results, and export your favourite matches. The TIGERSearch query language is introduced in chapter III. The TIGERSearch tool is described in chapter IV.

TIGERRegistry

The import and preprocessing of corpora is realized in a tool called TIGERRegistry. For the import of treebanks, we have developed the TIGER-XML corpus encoding format. New corpora have to be converted into this format first. To support as many existing treebanks as possible, TIGERRegistry also comprises import filters (i.e. converters to TIGER-XML) for many popular treebank formats. The TIGER-XML corpus encoding format and the implemented import filters are described in chapter V and subsection 3.5, chapter VI, respectively. The TIGERRegistry tool is described in chapter VI.

3. New features in the 2.x versions

In TIGERSearch 2.1 the following features have been introduced:

Query processing efficiency has been hugely improved for queries using regular expressions.

Additional indexing strategies have been implemented. These strategies increase query processing efficiency up to 50%.

The statistical viewer (introduced as statistical export in TIGERSearch 2.0) has been improved in many details.

There have been several minor improvements and bugfixes.

In TIGERSearch 2.0 the following features have been introduced:

The graphical user interface of the TIGERSearch software suite has been reimplemented. All important features can now be accessed directly from the GUI.

The following utilities have been added to the TIGERSearch query tool: bookmarks, syntax highlighting of corpus queries, and statistical export.

Corpus queries can be specified graphically, i.e. queries can be drawn in an intuitive way.

TIGERSearch has been internationalized, i.e. Unicode characters can be used in corpus queries and any Unicode character will also be rendered in corpus visualization.

Templates, a modularization concept for corpus queries, have been introduced.

The TIGERSearch manual can now be accessed from the GUI. It is also available as a PDF file for high-quality printing.

The TIGERSearch implementation is based on Java 1.4. Thus, many platform-dependent dialogs such as printing or file dialogs are much easier to handle.

4. Support

For up-to-date information about the TIGERSearch project, you should consult the TIGERSearch homepage: http://www.tigersearch.de

If you have any further question about the TIGERSearch software suite, just send a mail to the following mail address: tigersearch@ims.uni-stuttgart.de