Institut

Studium

Forschung


 

Frequently asked questions about PaIntE intonation modeling

Gregor Möhler

 

User questions

What is PaIntE?

PaIntE stands for Parametric INTonation Events. It is a method to describe stretches of F0 (usually around a peak or a intonation boundary) with a set of six parameters, the PaIntE parameters. The parametrization, i.e. the extraction of the parameters from F0, is done approximating F0 to a model function.

In speech synthesis we want to predict F0 from linguistic features of the input utterance. With PaIntE we could therefore predict the six parameters from the features, which is actually one method described below. Another method reduces the possible F0 curves to a distinct number of configurations using vector quantization.

How do I run a simple PaIntE analysis?

use: painte_analysis test.f0 -syl test.syl -tobi test.tones

where:

test.f0 is the F0 file (usually in ESPS format) which you want to parametrize,

test.syl the (ESPS style) label file containing the syllable boundaries and

test.tones an optional input file containing tone labels in tobi annotation (with intonation labels of type X* denoting accents and X% denoting phrase boundaries)

If a tobi file is present the algorithm only produces a parametrization for this labels. The output goes to STDOUT or a file if you use the -o option. painte_analysis and painte_synthesis (see next question) are programs written in the EST speech_tools library. If they are not in your path you can usually find them in the directory speech_tools/bin within a Festival source tree (e.g. in /usr/local/Festival_1.4/speech_tools/bin)

Can I synthesize an F0 file from a PaIntE parametrization?

Yes, you can. Just use:

painte_synthesis test.painte -syl test.syl -o test.new_f0

where:

test.painte is an ESPS style label file with the PaIntE parameters (as produced by painte_analysis)

test.syl is the (ESPS style) label file containing the syllable boundaries and

test.new_f0 will be the F0 file with the synthesized F0 contour


What is the difference between Painte, VQpainte and Vqpainte_notobi?

The names refer to different versions of the parametrization model. Painte is the model that simply uses the 6 parameters (a1,a2,b,c1,c2,d) as described in Gregor M�hler & Alistair Conkie (1998). During TTS the parameters are predicted using 6 different CART trees. Vqpainte is the model that uses vector-quantized intonation events. I.e. we have only a specific number of distinct pitch events. Currently the number ranges from as low as 4 up to 64. During TTS the appropriate intonation event is predicted from the accent structure (in ToBI notation) using CART trees. Vqpainte_notobi is the same model as Vqpainte. However in TTS no ToBI style information is used. The intonation events are only predicted from the place of the accents or boundaries (not the type). Since it has less information to evaluate it is inferior to the Painte model. However, it can be used when no ToBI accents/boundaries are available.

How can I use the PaIntE model for TTS with Festival?

Choose the appropriate voice:

Painte:
(voice_german_de2_painte)


Vqpainte:
(voice_german_de2_vqpainte N)
with N=(4,6,8,16,32,64) being the number of intonation events used in the model There are shortcuts of the style (voice_german_vqpainte16) for all possible numbers of N.


Vqpainte_notobi:
(voice_german_de2_vqpainte_notobi N)
with N=(4,6,8,16,32,64) being the number of intonation events used in the model

How can I define the pitch-range of a Painte-voice in Festival?

For the Painte and Vqpainte voices the pitch-range can be defined in 2 ways:

  1. As a default for the voice. Use the following command to define the upper and lower boundaries of the pitch range:
    (Param.set 'default_pr_base  50)
    (Param.set 'default_pr_top  70)
    
  2. Different for every phrase, e.g. for a specific discourse model. The features pr_top and pr_base of the node in the Phrase relation must be set to define the upper and lower boundaries of the pitch range.

Try it out:

festival> (voice_german_vqpainte16)
festival> (Param.set 'default_pr_base 50)
festival> (Param.set 'default_pr_top  70)
festival> (SayText "Guten Tag. Ich bin die neue Stimme und kann jetzt auch tief reden")
festival> (Param.set 'default_pr_base 180)
festival> (Param.set 'default_pr_top 220)
festival> (SayText "Guten Tag. Ich bin die neue Stimme und kann jetzt
auch hoch reden")

 


How do I get more background information?

Gregor Möhler & Alistair Conkie (1998). Parametric modeling of intonation using vector quantization. Proceedings of 3rd ESCA Workshop on Speech Synthesis, Jenolan Caves, Australia.

Gregor Möhler (2001). Improvements of the PaIntE model. Technical Report. IMS Universität Stuttgart.

Developers questions


What is the working directory for training the model?

/projekte/prosodie/Painte


What is the directory structure below the working directory?

bin
executable scripts and scheme files to train the model

lib
mainly parameter files used for training

doc
documentation of PaIntE. Currently this document.

e.g. nachrichten
sub-directories with speech databases (e.g. nachrichten for the IMS German News Corpus)

sd
speech files

f0
original (esps) F0 files

sf0
smoothed F0 files

lists
lists of files that are used for processing

relations
files with festival relations (for building Festival utterances)

utts
files with Festival utterances

Painte
sub-directory with resulting files from the Painte method.

Vqpainte
sub-directory with resulting files from the Vqpainte.

Vqpainte_notobi
sub-directory with resulting files from the Vqpainte_notobi.

Within the sub-directory of the speech database there is the data for the database (f0,utts, ...) as well as sub-directories for the models Painte, Vqpainte and Vqpainte_notobi.


What are the variants of the model?

The current variant of the model is determined by environment variables. They are usually set by sourcing the setting-files in the base directory of Painte. E.g. with the command:

> cd /projekte/prosodie/Painte/; source painte.defs

Beside the three different basic version of the model Painte, Vqpainte and Vqpainte_notobi (as explained above), there are currently the following options:

PAINTE_NORM_PR
If its value is 1 the models normalizes the painte parameters to the current pitch-range. If unset the painte parameters are used as is.


PAINTE_SYLNORM
The three values nonorm, sylnorm or anchor are possible. If this value is set to nonorm then the painte parameters are calculated without any time normalization. This means e.g., that the alignment parameter b is given in seconds (referring to the timing in the speech file). If it is unset or set to sylnorm the time is normalized with the syllable's length (default). I.e. the accented syllable ranges from 0 to 1. The preceding syllable from -1 to 0 and the post-accented syllable from 1 to 2. With method anchor three of the 6 parameters are mapped to a more suitable domain (cf. TechDoc). The slopes are described by their length in time, the position is anchored within the syllable structure.

What other parameters do exist?

Theses are the most important environment variables are recognized within the modeling process:

PAINTE_NORM_PR
See previous question.

PAINTE_SYLNORM
See previous question.

PAINTE_CBSIZE
The codebook size of the vector quantization (i.e. the number of intonation elements)

PAINTE_OVERLAP
A constant used in the parametrization function describing the amount of overlap between the rising and the falling slope. Don't change unless you know what you are doing. Default is 3.6

There are other environment variables that define the directories and paths during the modeling. Refer to the file painte.defs in the base directory of the model. 

How do I re-train the PaIntE parameters on the Nachrichten database?

First cd to the base-dir

> cd /projekte/prosodie/Painte

Then source the environment setting (possibly change some of the settings as described above)

> source painte.defs

If you want to use pitch-range normalization use: source painte.defs_prnorm

To prepare the database run all commands in the file

> bin/Make.prepare

To model Painte run all commands in the file

> bin/Make.painte

To model Vqpainte run all commands in the file

> bin/Make.vqpainte

To model Vqpainte_notobi run all commands in the file

> bin/Make.vqpainte_notobi

How do I train the PaIntE parameters on a new database?

Your new database has to have all the information necessary for modeling. I.e. speech files (preferably ESPS sd-files), word, syllable and segmental labels and prosodic annotation. Adopt all scripts in bin/Make.prepare to your needs by updating the pointers to your database). Then run the scripts in /projekte/prosodie/Painte/bin/Make.painte, ... as described above.

What if I want to create a new variant of the model?

If you intend to include minor changes to the model you may introduce a new environment variable or add new values to already existing variables. In this case the results of the new variant will overwrite all earlier results in the database directory (e.g. nachrichten).

Possibly the best way to introduce a new variant is, therefore, to create a new directory in parallel to the directories Painte, Vqpainte and Vqpainte_notobi. Then collect all steps necessary to train the model in a file Make.newmodel (similar to Make.painte). The new model can, of course, use the results of other versions like Painte (like Vqpainte uses the results of Painte). However, all output should to the newly created directory.

About this document ...

Frequently asked questions about PaIntE intonation modeling

This document was generated using the LaTeX2HTML translator Version 99.1 release (March 30, 1999)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 0 -no_navigation -dir src FAQ-painte.tex

The translation was initiated by Gregor Moehler on 2001-09-21


Gregor Moehler
2001-09-21