3.3 Internationalization of the query editor

A common problem for applications such as TIGERSearch is the keyboard input of characters which are not included in the ISO-Latin-1 character set. If you are working with a corpus that makes uses of such characters, you should consider the following three alternatives:

Please note: Typing in Unicode characters implies that Unicode charaters can be displayed (rendered) by the software. Thus, one of the Unicode fonts supported by TIGERSearch must have been installed on your system. Please consult section 3, chapter II for instructions.

Unicode encoding

The first alternative to encode a Unicode character is to type in its hexadecimal Unicode encoding. For example, the Greek capital letter Omega is represented by \u03a9. If you have typed in the Unicode encoding, just select the Expand Unicode Encodings option in the Input Help menu of the context menu to expand the character:

Please click to enlarge!

Figure: Expanding Unicode encodings

The Unicode encoding will be replaced by its corresponding character (cf. screenshot below). Please remind that a Unicode font must be installed to render the character properly.

Please click to enlarge!

Figure: Expansion of Unicode encodings

If you are frequently working with corpora using characters outside the ISO-Latin-1 character set, you should activate the Expand automatically option in the Input Help menu of the context menu.

Input help (operating system)

On many platforms, specialized tools have been developed to type in characters outside the ISO-Latin-1 character set. These tools are usually called input methods. As e.g. Greek characters do not exist on a German keyboard, these charaters are typed in as an abbreviation. For example, the string Omega might be used as an abbreviation for the Greek character that will be automatically expanded if the abbreviation has been typed in. Please consult the manual of your operating system to find out which tools are available for your platform.

Input help (TIGERSearch)

In the TIGERSeach Project we have implemented specialized input methods for 16 European languages which can be used in the TIGERSearch query editor (cf. subsection 3.2, chapter II). To activate the TIGERSearch input methods, press the upper left corner of the TIGERSearch window (Windows: press the tiger icon) and select the last option in the corresponding menu (usually called Choose input method).

The following screenshot shows how the input method is activated on a German Windows platform. The display will look similar on different platforms.

Please click to enlarge!

Figure: Activating the TIGERSearch input methods (1)

Now you are asked to choose one of the supported European languages. In the following screenshot, the Greek language (modern) is chosen:

Please click to enlarge!

Figure: Activating the TIGERSearch input methods (2)

The input method mode has been activated. A small status window is placed in the lower right corner of the screen. This window shows which language has been chosen and whether the input method is activated or deactivated:

Please click to enlarge!

Figure: Input method status window

To select a different language, you can either process the input method activation procedure described before or you can switch between the languages using the F7 key. To activate or deactivate the current input method please use the F8 key.

Please note: To deactive the TIGERSearch input methods (especially to deactivate the input method status window), start the input method selection procedure again, but choose system input methods in the input method menu.

How is the input method used in the query editor? All characters that are not included in the ISO-Latin-1 character set are represented by special abbreviations. To allow the input of the Latin characters as well as the special characters side by side in one mode, we have chosen encodings conventions used in the LaTeX system. For example, the German character ä is represented as \"a which is its LaTeX encoding. So if you have chosen the German keyboard mapper and you type in the character sequence \"a, it will be automatically expanded to ä by the TIGERSearch input help system.

Please note: Of course, all German characters are included in the ISO-Latin-1 character set. However, German special characters (ä,ö,ü,ß) can only be typed in on keyboards manufactured for the German market. Otherwise, an input method for the German language is necessary in order to work with German treebanks such as the TIGER treebank.

For languages such as Greek which comprises many special characters, a side by side usage of Latin and Greek characters is not possible. In this case, most Greek characters are represented by Latin characters. For example, the capital letter Omega is represented by the Latin character V. So if you type in V in the query editor, this input string is automatically expanded as the capital letter Omega. The following screenshot illustrates how Greek characters are typed in:

Please click to enlarge!

Figure: Typing in Greek characters

The mapping tables for the 16 supported European languages can be found in the file europe.pdf which is placed in the doc/pdf/ subdirectory of your TIGERSearch installation. It can also be downloaded from the TIGERSearch homepage (cf. http://www.tigersearch.de).