A Computational Model of Target Oriented Production ofProsody

(short title: Production of Prosody)

externer Link DFG Grant to G. Dogil & B. Möbius: 1 Oct 2001 - 30 Sep 2004

Project description

The two main goals of the proposed project are, first, to establish anovel paradigm of research into the production of prosody and, second,to provide experimental evidence for several assumptions made by thecomputational model underlying the proposed approach. Our approach isinspired by the speech production model recently proposed by FrankGuenther, Joe Perkell, and colleagues. Their model posits that speechproduction is constrained by auditory and perceptual requirements. Theonly invariant targets of the speech production process are auditoryperceptual targets. The targets are characterized as multidimensionalregions in the perceptual space, and speech movements are trajectoriesplanned to traverse the target regions. Our project rests on theassumption that these statements hold for the production of prosody aswell.

In the framework of the proposed research project a computationalprosody model will be implemented that has its motivation both in thetheory of speech production and in linguistic theory. Thecomputational model is intended to serve two main purposes. First, itwill allow us to empirically test a number of assumptions made by theproduction model, for instance the effect of speaking rate and otherfactors related to speech timing on the acoustic realization ofintonational gestures. Second, the linguistically based classificationof intonational events, e.g. those related to (a) discourse structure(register, pitch range), (b) information structure (topic, focus), and(c) accentual patterns (pitch accents, tones, tunes), can beexperimentally tested by trainable intonation event classifiers. Aneural network architecture can learn mappings between referenceframes (the perceptual target regions) and speech/intonationalgestures. In accordance with the postural relaxation hypothesis weexpect such a learned neural mapping, based on adjustable adaptiveweights, to tend consistently towards comfortable realizationconfigurations, either under the influence of temporal constraints oras part of a tune (a coherent sequence of intonational events), aslong as the perceptual target region is traversed.