Indexing the TIGERCorpus

The TIGERSearch software suite is available for working with the TIGER Treebank. Users can query a corpus and browse through the graphical visualization of the corpus sentences. TIGERSearch is available at www.tigersearch.de.

Before working with the TIGER Corpus in the TIGERSearch main program, you first have to index the corpus using the TIGERRegistry program.

To index the corpus, first mark the parent folder of the new corpus (in the example: German). Now click the Insert Corpus button in the button toolbar or choose the Insert Corpus item in the popup menu (right mouse click):

Please click to enlarge!

Figure: Inserting the new corpus

Next the corpus indexing window pops up. First of all, you have to specify the corpus input format. As we will use the TIGER-XML version of the TIGER Corpus, choose TIGER-XML Format. Selecting this option deactivates some parameters of the windows:

Please click to enlarge!

Figure: Indexing parameters (TIGER-XML corpus input)

Now you have to specify the following parameters:

Corpus ID

The corpus ID is used by the TIGERSearch to identify the new corpus. The corpus ID must be unique with regard to all other indexed corpora. The uniqueness is checkedbefore the indexing process is initiated. The ID has to start with a letter.

TIGER-XML file

Please specify the path to the TIGER-XML version of the TIGER Corpus. You might use the Choose button to search for your file.

After specifying the indexing parameters, you can start the indexing process by pressing the Start button. The corpus indexing can be stopped at any time. The current progress of the indexing is displayed by the indexing progress window:

Please click to enlarge!

Figure: Indexing progress window

When the indexing process is finished, the Corpus properties window pops up (see screenshot below). Here you can specify meta information about the corpus such as the corpus name.

Please click to enlarge!

Figure: Corpus properties window

After the corpus properties specification, just press the OK button to finish corpus indexing. Now the new corpus can be found in the corpus tree.