Institut

Studium

Forschung


 

next up previous contents
Next: Phrasing Up: Modules Previous: Expansion of the Festival &nbsp Contents

POS tagging

POS tagging specifies different part of speech (POS) tags for the words in the word relation. For German the STTS tagset ist used for POS tagging [11]. In cases where no POS tagging takes place the POS tags are taken from the lexicon entries. Since the POS information usually provided by lexicons is not as explicit as the full STTS tagset only the main categories of STTS are used. This ``underspecified'' STTS (USTTS) tagset contains the categories ADJ, ART, ADV, KO, ITJ, N, CARD, AP, P, PTK, V.

In the Open-Source version no POS tagging is carried out (module No_POS) and the POS tags (out of the USTTS tagset) are taken from the lexicon entries and can, therefore, be wrong for homographs.

In a more advanced version we use a stochastic tree tagger. The tree tagger algorithm was developed and implemented by Helmut Schmid ( /fak5/ims/projekte/corplex/DecisionTreeTagger.html). The tagger is included in the Festival sources and can be found in the directory festival/src/modules/ims_german_POS/. It is used in the intonation module to determine the accents 3.8 and find the correct pronunciation of homographs (``Weg'' vs. ``weg'').


next up previous contents
Next: Phrasing Up: Modules Previous: Expansion of the Festival &nbsp Contents
Martin Barbisch
2001-08-28