Institut

Studium

Forschung


 

Test Environment for the Two Level Model Of Germanic Prominence

by Gregor Möhler & Grzegorz Dogil,
University of Stuttgart, Institute of Natural Language Processing

Abstract

In this work we present a test bed designed to verify the two level model of Germanic prominence. We give an introduction to the linguistic background of the model and derive the features that the test environment should possess. Finally we describe the details of the implementation in the ESPS/xwaves environment. The implementation is based on resynthesis using PSOLA algorithm. As a linguistic application the Tone Sequence Model (TSM) is implemented and tested.

Keywords: prosody, modelling of f0, PSOLA, Tone-Sequence-Model (TSM).

Rule Based Generation of Fundamental Frequency Contours for German Utterances

by Gregor Möhler,
University of Stuttgart, Institute of Natural Language Processing

Abstract

Evaluations of text-to-speech systems have shown that systems with sophisticated control of prosody sound more natural than those having a very good control of the sound structure but not so elaborated prosody. This calls for a broad research in the field of prosody, which in the past has received much less interest than the study of segmental sound structure in the areas of both general phonetics and speech technology. Among the different aspects of prosody (intonation, accent and rhythm) the study of intonation, and its expression fundamental frequency (F0), plays an outstanding role. In this paper a method is described which allows the generation of F0 contours close to natural ones from an abstract linguistic model. The model`s principles define a set of labels from which F0 contours are generated by means of rules. The linguistic assumptions of the prosody generation method discussed in this paper are based on the Tone-Sequence-Model (TSM), an established theory of prosodic phonology. The generation program itself is written in C++ and is embedded in the xwaves speech analysis environment.

Keywords: fundamental frequency contours, f0 generation, prosody, PSOLA

PSOLA Module for CHATR

by Gregor Möhler,
University of Stuttgart, Institute of Natural Language Processing

Abstract

In this work a signal processing module is presented that can deal with mismatches between the target specifications and the features of a selected unit in concatenative speech synthesis. Due to finite database length a selected unit will not necessarily meet its target specification in F0 and duration. However, in prosodically important parts of speech such a mismatch cannot be accepted. The described module modifies the synthesized sentence in $F_0$ and duration based on the Pitch-Synchronous-OverLap&Add (PSOLA) algorithm. Different approaches of how to apply the algorithm to the selected units are discussed and details of the implementation are described. Finally, a proposal is given of how to apply the algorithm with varying prosodic relevance over the utterance.

Detection of Creaky Voice in Speech Signals

by Gregor Möhler,
University of Stuttgart, Institute of Natural Language Processing

Abstract

Reliable F0 extraction and pitch marking are essential for a good unit selection in concatenative speech synthesis. But natural speech is subject to irregularities. The phenomena is often described by the terms "creaky voice" or "laryngealization". The problem is that the fundamental frequency is hard to define in these parts of speech and extracting F0 will often result in a F0 contour jumping between different harmonics of the signal. But this is unacceptable for concatenative speech synthesis systems. We are therefor looking for a method that could detect sections of creaky voice in the speech database.
In this work a method has been developed that can detect irregularities in speech signals. It works on the basis of an F0 algorithm (ADMF) which presents different candidates for F0 to a Recurrent Neural Network (RNN) classifier. The classifier is trained and tested on the female voices of a German Database (M�SLI) with annotated creaky periods. This essentially very simple approach leads to 42% recognition rate in an open test. A program based on this RNN has been written that now can detect irregularities in a speech synthesis database.

Thu Jul 17 16:33:35 1997
gm