Section: Internationalization

As the TIGERSearch software suite has been entirely implemented in Java, it is able to process corpora using the Unicode encoding. For the import of corpora, an XML-based approach is used to read any Unicode characters (cf. chapter V). The query processor is also able to process Unicode characters (cf. subsection 3.3, chapter IV).

As a platform-independent software, the TIGERSearch software suite is not able to analyze the font configuration of the user's platform in order to automatically detect an appropriate Unicode font. Therefore only the following two popular Unicode fonts are supported by the software:

Arial Unicode MS (Arialuni.ttf, 52,000 characters; 23 MB installed)

This font package is not freely available. However, it is included in the following commercial software packages: Microsoft Windows XP, Microsoft Office 2000, and Microsoft Publisher. You can easily check in the Windows System Control if the font package has already been installed on your computer. Otherwise, you will find the package on the CD-ROM of your commercial software.

Cyberbit Bitstream (Cyberbit.ttf, 30,000 characters; 12.5 MB installed)

This font package is freely available on the following web page: ftp://ftp.netscape.com/pub/communicator/extras/fonts/windows/. Please pay attention to the license agreement of the font package.

If one of these two font packages has been installed on your system, the TIGERSearch software will automatically detect and use it. Please consult the manual of your operating system how to install font packages on your computer.

Please note: If you are working on a system where you do not have the user rights to install a font package, but you have already installed the TIGERSearch system on your computer, there is a workaround to install the font package to be used by the TIGERSearch software only: Just copy the font file (suffix .ttf) to the following subdirectory of the TIGERSearch installation directory: jre/lib/fonts/

3.2 Input methods

If you have installed a Unicode font package to be used by the TIGERSearch software, TIGERSearch will be able to display any Unicode character which is supported by the font package. However, typing in Unicode characters is a different story. So how can you type in a Unicode character in the corpus query editor, e.g. the Greek capital letter Omega?

On most platforms, specialized tools have been developed for this purpose. These tools are usually called input methods. As e.g. Greek characters do not exist on a German keyboard, these charaters are typed in as an abbreviation. For example, the string Omega might be used as an abbreviation for the Greek character that will be automatically expanded if the abbreviation has been typed in. Please consult the manual of your operating system to find out which tools are available for your platform.

For the TIGERSearch software we have developed Java-based input methods for 16 European languages (in alphabetic order): Czech, Danish, Dutch, Finnish, French, German, Greek (classic and modern), Hungarian, Italian, Latin, Norwegian, Portugese, Romanian, Spanish, Swedish, and Turkish.

These input methods are automatically plugged into the TIGERSearch software suite during the installation process. In subsection 3.3, chapter IV you will find detailed instructions how to use the TIGERSearch input methods for typing in Unicode characters in your corpus queries. A description of the language mapping tables (file europe.pdf) is placed in the doc/pdf/ subdirectory of your TIGERSearch installation.

Readers who are interested in the implementation of the input methods should consult the TIGERSearch homepage for additional information.

3. Internationalization

3.1 Unicode fonts

3.2 Input methods