3.1 Introduction

The indexing of corpora is based on the TIGER-XML format. Corpora encoded in other formats have to be converted to TIGER-XML first. So if your corpus source file is not encoded in TIGER-XML format, you will have to use one of the existing corpus format filters (i.e. converters to TIGER-XML, cf. subsection 3.5) or convert your corpus to TIGER-XML on your own.

To index a corpus, first mark the parent folder of the new corpus (in the example: German). Now click the Insert Corpus button in the button toolbar or choose the Insert Corpus item in the popup menu (right mouse click):

Please click to enlarge!

Figure: Inserting a new corpus

Next the corpus indexing window pops up. First of all, you have to specify the corpus input format: TIGER-XML Format or Other Format:

Please click to enlarge!

Figure: Corpus indexing window

The additional parameters of the indexing windows are explained in the following subsections (cf. subsection 3.2 and subsection 3.3).

Please note: During the corpus indexing process a corpus directory, which comprises several corpus files, is generated. The directory and the files in it are created in a platform-independent way. So if you are working on a platform that allows for fine-grained user permissions (e.g. Unix), you should check the permissions of the new corpus directory right after the indexing process has finished in order to make sure that the desired group of TIGERSearch users will be able to access the newly created corpus.