Lexicon lookup

In the module Word, grapheme-to-phoneme conversion takes place. This process is divided into three steps:

  1. Festival first tries to find the word in the addenda lexicon. This is a supplement to the lexicon which typically contains some user specific entries. Entries can be added using the scheme-function ``lex.add.entry''. For the German lexicons it is called sampa_addenda and is defined in the file ims_german_lexicons.scm.
  2. If the word is not found in the addenda lexicon, it is looked up in the compiled lexicon. This is done by a binary search in the compiled lexicon. We currently use two different full-form lexicons: the BOMP lexicon [5] from the University of Bonn, Germany, which is distributed together with the Open-Source version of IMS German Festival, and the Celex lexicon [1]. German lexicons are in the directory festival/lib/german/dicts/.
  3. Since German wordforms are highly productive, many words are not found in the lexicon. These words are converted to their transcription using letter-to-sound (LTS) rules are used. Since these rules often provide unsatisfying results, as many words of the application domain as possible should be in the lexicon. The LTS rules can be found in the fileims_german_lts.scm

For a language like German, which is rich in derivations and compounds it is very helpful to have a morphological component with a lemma lexicon instead of a full-form lexicon. Currently, such a component is being developed at the IMS.

