PaIntE stands for Parametric INTonation Events. It is a method to
describe stretches of F0 (usually around a peak or a intonation
boundary) with a set of six parameters, the PaIntE parameters. The
parametrization, i.e. the extraction of the parameters from F0, is
done approximating F0 to a model function.
In speech synthesis we want to predict F0 from linguistic features of
the input utterance. With PaIntE we could therefore predict the six
parameters from the features, which is actually one method described
below. Another method reduces the possible F0
curves to a distinct number of configurations using vector
quantization.
test.f0 is the F0 file (usually in ESPS format) which you want to parametrize,
test.syl the (ESPS style) label file containing the syllable boundaries and
test.tones an optional input file containing tone labels in tobi annotation (with intonation labels of type X* denoting accents
and X% denoting phrase boundaries)
If a tobi file is present the algorithm only produces a
parametrization for this labels. The output goes to STDOUT or a file
if you use the -o option. painte_analysis and painte_synthesis (see
next question) are programs written in the EST speech_tools
library. If they are not in your path you can usually find them in the
directory speech_tools/bin within a Festival source tree (e.g. in
/usr/local/Festival_1.4/speech_tools/bin)
test.painte is an ESPS style label file with the PaIntE parameters (as produced by painte_analysis)
test.syl is the (ESPS style) label file containing the syllable boundaries and
test.new_f0 will be the F0 file with the synthesized F0 contour
What is the difference between Painte, VQpainte and Vqpainte_notobi?
The names refer to different versions of the parametrization model.
Painte is the model that simply uses the 6 parameters
(a1,a2,b,c1,c2,d) as described in Gregor Möhler & Alistair Conkie
(1998). During TTS the parameters are predicted using 6 different CART
trees. Vqpainte is the model that uses vector-quantized intonation
events. I.e. we have only a specific number of distinct pitch
events. Currently the number ranges from as low as 4 up to 64. During
TTS the appropriate intonation event is predicted from the accent
structure (in ToBI notation) using CART trees. Vqpainte_notobi is the
same model as Vqpainte. However in TTS no ToBI style information is
used. The intonation events are only predicted from the place of the
accents or boundaries (not the type). Since it has less information to
evaluate it is inferior to the Painte model. However, it can be used
when no ToBI accents/boundaries are available.
(voice_german_de2_vqpainte N)
with N=(4,6,8,16,32,64) being the number of intonation events used in the model
There are shortcuts of the style (voice_german_vqpainte16) for all possible numbers of N.
Vqpainte_notobi:
(voice_german_de2_vqpainte_notobi N)
with N=(4,6,8,16,32,64) being the number of intonation events used in the model
Different for every phrase, e.g. for a specific discourse model. The features pr_top and pr_base of the node in the Phrase
relation must be set to define the upper and lower boundaries of the pitch range.
Try it out:
festival> (voice_german_vqpainte16)
festival> (Param.set 'default_pr_base 50)
festival> (Param.set 'default_pr_top 70)
festival> (SayText "Guten Tag. Ich bin die neue Stimme und kann jetzt auch tief reden")
festival> (Param.set 'default_pr_base 180)
festival> (Param.set 'default_pr_top 220)
festival> (SayText "Guten Tag. Ich bin die neue Stimme und kann jetzt
auch hoch reden")
How do I get more background information?
Gregor Möhler & Alistair Conkie (1998). Parametric modeling of intonation using vector quantization. Proceedings of 3rd ESCA Workshop on Speech Synthesis, Jenolan Caves, Australia.
Gregor Möhler (2001). Improvements of the PaIntE model. Technical
Report. IMS Universität Stuttgart.
What is the working directory for training the model?
/projekte/prosodie/Painte
What is the directory structure below the working directory?
bin
executable scripts and scheme files to train the model
lib
mainly parameter files used for training
doc
documentation of PaIntE. Currently this document.
e.g. nachrichten
sub-directories with
speech databases (e.g. nachrichten for the IMS German News Corpus)
sd
speech files
f0
original (esps) F0 files
sf0
smoothed F0 files
lists
lists of files that are used for processing
relations
files with festival relations (for building Festival utterances)
utts
files with Festival utterances
Painte
sub-directory with resulting files from the
Painte method.
Vqpainte
sub-directory with resulting files from the
Vqpainte.
Vqpainte_notobi
sub-directory with resulting files from the Vqpainte_notobi.
Within the sub-directory of the speech database there is the data for the database (f0,utts, ...) as well as sub-directories
for the models Painte, Vqpainte and Vqpainte_notobi.
What are the variants of the model?
The current variant of the model is determined by environment
variables. They are usually set by sourcing the setting-files in the
base directory of Painte. E.g. with the command:
> cd /projekte/prosodie/Painte/; source painte.defs
Beside the three different basic version of the model Painte, Vqpainte
and Vqpainte_notobi (as explained above),
there are currently the following options:
PAINTE_NORM_PR
If its value is 1 the models normalizes the
painte parameters to the current pitch-range. If unset the painte
parameters are used as is.
PAINTE_SYLNORM
The three values nonorm, sylnorm or
anchor are possible. If this value is set to nonorm then
the painte parameters are calculated without any time
normalization. This means e.g., that the alignment parameter b is
given in seconds (referring to the timing in the speech file). If it is
unset or set to sylnorm the time is normalized with the
syllable's length (default). I.e. the accented syllable ranges from 0
to 1. The preceding syllable from -1 to 0 and the post-accented
syllable from 1 to 2. With method anchor three of the 6
parameters are mapped to a more suitable domain
(cf. TechDoc). The slopes are described by their
length in time, the position is anchored within the syllable
structure.
The codebook size of the vector quantization (i.e. the number of intonation elements)
PAINTE_OVERLAP
A constant used in the parametrization function
describing the amount of overlap between the rising and the falling
slope. Don't change unless you know what you are doing. Default is 3.6
There are other environment variables that define the directories and
paths during the modeling. Refer to the file painte.defs in the base
directory of the model.
Your new database has to have all the information necessary for
modeling. I.e. speech files (preferably ESPS sd-files), word, syllable
and segmental labels and prosodic annotation. Adopt all scripts in
bin/Make.prepare to your needs by updating the pointers to
your database). Then run the scripts in
/projekte/prosodie/Painte/bin/Make.painte, ... as described
above.
If you intend to include minor changes to the model you may introduce
a new environment variable or add new values to already existing
variables. In this case the results of the new variant will overwrite
all earlier results in the database directory (e.g. nachrichten).
Possibly the best way to introduce a new variant is, therefore, to
create a new directory in parallel to the directories Painte, Vqpainte
and Vqpainte_notobi. Then collect all steps necessary to train the
model in a file Make.newmodel (similar to Make.painte). The new model
can, of course, use the results of other versions like Painte (like
Vqpainte uses the results of Painte). However, all output should to
the newly created directory.