Uni Stuttgart
---  Home ---  Events
---  Courses ---  Jobs
---  Projects ---  Contact
---  Resources ---  German
---  Search Impressum  Impressum

[Unilogo]

 Universität Stuttgart 
 Institut für Maschinelle Sprachverarbeitung 
 synthesis@smartkom project plan (excerpt) 
Home - Members - Demos - Documents - Relevant links - Workshops --- internal
 
 

synthesis@smartkom project plan (excerpt)

The goal of the speech synthesis project within the SmartKom consortium is to develop a speech synthesis module that is capable of producing natural sounding German speech. This general goal is achieved when the user of the SmartKom system is satisfied with the system's voice. This includes intelligible speech and a friendly voice, but also the appropriateness of the system response for a given task and within a certain dialogue state. The speech output component to be developed has to be in accordance with other modes of the multimodal interaction possibly used in parallel to speech. The goal, therefore, includes the successful integration of the module within the SmartKom system.  An additional goal of this project is to invent and explore innovative methods within the framework of speech synthesis in order to contribute to the research carried out in this field.

Objectives

A. Natural speech.
The naturalness of the speech output is related to two different aspects, the segmental quality of speech as well as the prosodic quality. This goal can only be evaluated with formal evaluations.
B. Friendly voice.
In order to obtain a friendly voice that can be used within the system we have to select an appropriate speaker. The objective is to find a speaker whose voice will sound natural when used for data-based speech synthesis including stability for a specific signal processing method.
C. Adequacy.
The synthesized speech has to be accepted by the user of the system given a specific task and interaction state. This may include the partial implementation of non-German synthesis for tasks where the data is usually non-German. The speech output has to be coordinated with the other modalities that might be used in parallel or in sequence. The speech has to be applicable to the three different SmartKom application scenarios Public, Mobile and Home/Office.
D. Innovative methods.
In order to underline the research characteristics of this project we have to allocate enough time to pursue non-direct solutions that could eventually lead to innovative technologies. Regular publications on established conferences, relevant workshops and articles in research magazines are essential to show that important work is carried out at the IMS in the field of speech synthesis.
E. System integration.
Although being a presupposition to all the other objectives system integration is mentioned as an objective on its own. It includes the porting of the system to the platform required in the SmartKom system (possibly even a different operating system). Another important objective is the specification and implementation of the interfaces to and from the synthesis module.

General Approach and Contractual Aspects

As the technical task is to develop a speech output module for a multimodal system, all sub-tasks of a text-to-speech (TTS) system and of a concept-to-speech (CTS) system have to be taken care of.  For the following items detailed work will be carried out within the project.
A. Selecting a friendly voice
Among a number of available speakers we have to find the one whose voice is most appropriate for speech synthesis. The general approach is to perceptually evaluate test recordings of the different speakers under various conditions.
B. Create a new diphone voice
The baseline speech synthesis method is a state-of-the-art diphone synthesis. We will first use freely available diphones and then record diphones with the voice selected under task A. The diphones have to be integrated into the system.
C. Construction of the speech synthesis database
The speech database will be recorded from the speaker selected under A. Its size and content has to reflect the nature of the speech synthesis methods that will be used for speech synthesis (see task C and D), as well as the application domains of the system.
D. Development and integration of natural speech synthesis methods
On top of the diphone system a new synthesis approach using non-uniform segments will be developed. It will be based on the experience of known approaches published in the field but will take into account the specific character of the project.
E. Development of a prosody module for multimodal speech synthesis
We will develop a method of prosody prediction that is capable of generating the appropriate accents and prosodic boundaries in a certain dialogue situation. To do so syntactic and other context information will be deducted from the language generation component that is developed within the SmartKom project.
F. Definition of the Interfaces between SmartKom modules
Various interfaces will have to be defined between the speech output component and other modules of the SmartKom system. Among them are the interfaces to the l anguage generation module, the audio module, the dynamic lexicon and the presentation manager. The interface language is XML with the interface described in Schemata.
G. Evaluation of speech synthesis
To ensure optimal speech synthesis quality and a system response that is appropriate in a specific dialogue context the speech output component has to be formally evaluated. Since only few criteria can be tested objectively the main emphasis is put on perceptual evaluation.
H. System integration
The system has to be integrated into the SmartKom testbed and into the SmartKom demonstrator. This task may include porting the module from Linux to a different operation system like Windows.

For several other tasks that are necessary to build the speech output component we will integrate already existing results, off-the-shelf solutions or research work outside the SmartKom project. Among these are the speech synthesis architecture itself, the general text-pre-processing, the fullform lexicon, the duration prediction and F0 modeling. The SmartKom project group is integrated within the phonetics group of IMS from which valuable input is expected. We also aim at several cooperations that will have positive impact on the project. Currently cooperations are negotiated with the Faculté Politechnique de Mons (in the field of speech signal processing), the Oregon Graduate Institute (for the tasks of evaluation and non-uniform unit selection) and the Royal Institute of Technology KTH (for multimodal speech synthesis). Additional technical help will come from several technical committees to be organized internally, namely the project administration, the Festival administration group, the audio studio administration group and the system administration group.

IMS Stuttgart, Mon Aug 18 17:22:47 2003 (www-admin@ims.uni-stuttgart.de)