- User questions
- What is PaIntE?
- How do I run a simple PaIntE analysis?
- Can I synthesize an F0 file from a PaIntE parametrization?
- What is the difference between Painte, VQpainte and Vqpainte_notobi?
- How can I use the PaIntE model for TTS with Festival?
- How can I define the pitch-range of a Painte-voice in Festival?
- How do I get more background information?
- Developers questions
- What is the working directory for training the model?
- What is the directory structure below the working directory?
- What are the variants of the model?
- What other parameters do exist?
- How do I re-train the PaIntE parameters on the Nachrichten database?
- How do I train the PaIntE parameters on a new database?
- What if I want to create a new variant of the model?
- About this document ...
PaIntE stands for Parametric INTonation Events. It is a method to describe stretches of F0 (usually around a peak or a intonation boundary) with a set of six parameters, the PaIntE parameters. The parametrization, i.e. the extraction of the parameters from F0, is done approximating F0 to a model function.
In speech synthesis we want to predict F0 from linguistic features of the input utterance. With PaIntE we could therefore predict the six parameters from the features, which is actually one method described below. Another method reduces the possible F0 curves to a distinct number of configurations using vector quantization.
use: painte_analysis test.f0 -syl test.syl -tobi test.tones
- test.f0 is the F0 file (usually in ESPS format) which you want to parametrize,
- test.syl the (ESPS style) label file containing the syllable boundaries and
- test.tones an optional input file containing tone labels in tobi annotation (with intonation labels of type X* denoting accents and X% denoting phrase boundaries)
If a tobi file is present the algorithm only produces a parametrization for this labels. The output goes to STDOUT or a file if you use the -o option. painte_analysis and painte_synthesis (see next question) are programs written in the EST speech_tools library. If they are not in your path you can usually find them in the directory speech_tools/bin within a Festival source tree (e.g. in /usr/local/Festival_1.4/speech_tools/bin)
Yes, you can. Just use:
painte_synthesis test.painte -syl test.syl -o test.new_f0
- test.painte is an ESPS style label file with the PaIntE parameters (as produced by painte_analysis)
- test.syl is the (ESPS style) label file containing the syllable boundaries and
- test.new_f0 will be the F0 file with the synthesized F0 contour
The names refer to different versions of the parametrization model. Painte is the model that simply uses the 6 parameters (a1,a2,b,c1,c2,d) as described in Gregor MÃÂÃÂ¯ÃÂÃÂ¿ÃÂÃÂ½hler & Alistair Conkie (1998). During TTS the parameters are predicted using 6 different CART trees. Vqpainte is the model that uses vector-quantized intonation events. I.e. we have only a specific number of distinct pitch events. Currently the number ranges from as low as 4 up to 64. During TTS the appropriate intonation event is predicted from the accent structure (in ToBI notation) using CART trees. Vqpainte_notobi is the same model as Vqpainte. However in TTS no ToBI style information is used. The intonation events are only predicted from the place of the accents or boundaries (not the type). Since it has less information to evaluate it is inferior to the Painte model. However, it can be used when no ToBI accents/boundaries are available.
Choose the appropriate voice:
- (voice_german_de2_vqpainte N)
with N=(4,6,8,16,32,64) being the number of intonation events used in the model There are shortcuts of the style (voice_german_vqpainte16) for all possible numbers of N.
- (voice_german_de2_vqpainte_notobi N)
with N=(4,6,8,16,32,64) being the number of intonation events used in the model
For the Painte and Vqpainte voices the pitch-range can be defined in 2 ways:
- As a default for the voice. Use the following command to define the upper and lower boundaries of the pitch range:
(Param.set 'default_pr_base 50) (Param.set 'default_pr_top 70)
- Different for every phrase, e.g. for a specific discourse model. The features pr_top and pr_base of the node in the Phrase relation must be set to define the upper and lower boundaries of the pitch range.
Try it out:
festival> (voice_german_vqpainte16) festival> (Param.set 'default_pr_base 50) festival> (Param.set 'default_pr_top 70) festival> (SayText "Guten Tag. Ich bin die neue Stimme und kann jetzt auch tief reden") festival> (Param.set 'default_pr_base 180) festival> (Param.set 'default_pr_top 220) festival> (SayText "Guten Tag. Ich bin die neue Stimme und kann jetzt auch hoch reden")
Gregor Möhler & Alistair Conkie (1998). Parametric modeling of intonation using vector quantization. Proceedings of 3rd ESCA Workshop on Speech Synthesis, Jenolan Caves, Australia.
Gregor Möhler (2001). Improvements of the PaIntE model. Technical Report. IMS Universität Stuttgart.
- executable scripts and scheme files to train the model
- mainly parameter files used for training
- documentation of PaIntE. Currently this document.
- e.g. nachrichten
- sub-directories with speech databases (e.g. nachrichten for the IMS German News Corpus)
- speech files
- original (esps) F0 files
- smoothed F0 files
- lists of files that are used for processing
- files with festival relations (for building Festival utterances)
- files with Festival utterances
- sub-directory with resulting files from the Painte method.
- sub-directory with resulting files from the Vqpainte.
- sub-directory with resulting files from the Vqpainte_notobi.
Within the sub-directory of the speech database there is the data for the database (f0,utts, ...) as well as sub-directories for the models Painte, Vqpainte and Vqpainte_notobi.
The current variant of the model is determined by environment variables. They are usually set by sourcing the setting-files in the base directory of Painte. E.g. with the command:
> cd /projekte/prosodie/Painte/; source painte.defs
Beside the three different basic version of the model Painte, Vqpainte and Vqpainte_notobi (as explained above), there are currently the following options:
- If its value is 1 the models normalizes the painte parameters to the current pitch-range. If unset the painte parameters are used as is.
- The three values nonorm, sylnorm or anchor are possible. If this value is set to nonorm then the painte parameters are calculated without any time normalization. This means e.g., that the alignment parameter b is given in seconds (referring to the timing in the speech file). If it is unset or set to sylnorm the time is normalized with the syllable's length (default). I.e. the accented syllable ranges from 0 to 1. The preceding syllable from -1 to 0 and the post-accented syllable from 1 to 2. With method anchor three of the 6 parameters are mapped to a more suitable domain (cf. TechDoc). The slopes are described by their length in time, the position is anchored within the syllable structure.
Theses are the most important environment variables are recognized within the modeling process:
- See previous question.
- See previous question.
- The codebook size of the vector quantization (i.e. the number of intonation elements)
- A constant used in the parametrization function describing the amount of overlap between the rising and the falling slope. Don't change unless you know what you are doing. Default is 3.6
There are other environment variables that define the directories and paths during the modeling. Refer to the file painte.defs in the base directory of the model.
First cd to the base-dir
> cd /projekte/prosodie/Painte
Then source the environment setting (possibly change some of the settings as described above)
> source painte.defs
If you want to use pitch-range normalization use: source painte.defs_prnorm
To prepare the database run all commands in the file
To model Painte run all commands in the file
To model Vqpainte run all commands in the file
To model Vqpainte_notobi run all commands in the file
Your new database has to have all the information necessary for modeling. I.e. speech files (preferably ESPS sd-files), word, syllable and segmental labels and prosodic annotation. Adopt all scripts in bin/Make.prepare to your needs by updating the pointers to your database). Then run the scripts in /projekte/prosodie/Painte/bin/Make.painte, ... as described above.
If you intend to include minor changes to the model you may introduce a new environment variable or add new values to already existing variables. In this case the results of the new variant will overwrite all earlier results in the database directory (e.g. nachrichten).
Possibly the best way to introduce a new variant is, therefore, to create a new directory in parallel to the directories Painte, Vqpainte and Vqpainte_notobi. Then collect all steps necessary to train the model in a file Make.newmodel (similar to Make.painte). The new model can, of course, use the results of other versions like Painte (like Vqpainte uses the results of Painte). However, all output should to the newly created directory.
Frequently asked questions about PaIntE intonation modeling
This document was generated using the LaTeX2HTML translator Version 99.1 release (March 30, 1999)
The command line arguments were:
latex2html -split 0 -no_navigation -dir src FAQ-painte.tex
The translation was initiated by Gregor Moehler on 2001-09-21