Institut für Maschinelle Sprachverarbeitung
home uni IMS suche search kontakt contact
unilogo Universität Stuttgart

Homepage von Bettina Säuberlich

 
 

PhD Thesis

My PhD thesis focusses mainly on two aspects of unit selection speech synthesis: corpus design and data administration and selection during synthesis.

To design an appropriate corpus for a unit selection system, I examined a large text corpus of German newspapers (HGC), football text and touristical texts of different citys. Therefore these texts have been transcribed automatically via the IMS Festival System and statistical analyses about the distribution of different unit types (i.e. words, syllables, phones and diphones) taking into account or leaving out annotated features (i.e. stress, tone, accent, positional attributes, word class, phonemic context). The annotated text material comprised about 300.000 sentences. Because of the LNRE-nature of language I decided to cover the most common syllables and to complete the rest with the then missing diphones. Using a greedy-algorithm, I generated a new smaller corpus, consisting of about 4000 sentences. These sentences were recorded with a professional male speaker and a professional female speaker and afterwards automatically labelled with an aligner constructed at the IMS. Some of the files have been hand corrected afterwards.

The second part was the design and implementation of a data management module to administrate the speech corpus efficiently and to admit efficient access to the required units. Therefore I decided to use an decision tree approach to cluster similar units, with each level of the tree representing a special feature (i.e. previous phoneme, stress, tone, accent etc.). The order of the features is given by the user and motivated linguistically. The trees can be easily rebuilt, if a feature order seems to be suboptimal. The features represent only symbolic attributes of the units to overcome unsecure predictions of the system. All different unit types use the same tree model but with different feature orders, depending on the unit type. The access to the appropriate clusters happens efficiently via indices representing a feature order. The unit selection process uses PSM algorithm, i.e. a top down strategy which prefers longer units. The new module is integrated in the IMS Festival Speech Synthesis System and is written in c++.

In my work I further will examine which basic unit is most appropriate for synthesis, i.e. phone, demi-phone or diphone. Also I will compare different feature orders to determine the most important features for perception.