Institut

Studium

Forschung


 

Transcription of German Intonation

The Stuttgart System (WWW version, May 15, 1995)

Jörg Mayer, University of Stuttgart, Germany

joemayer@ims.uni-stuttgart.de

Many thanks to Ralf Benzmüller, Martine Grice (Saarbruecken) and Matthias Reyelt(Braunschweig) for discussions and data exchange!


Content


1 Introduction

The Stuttgart System is an attempt to integrate the phonological analysisof German intonation done by C. Fery [1] and the ToBI labellingconventions [2], [3]. The system was developed as a tool within ouroverall aim which is to create a prosodic module for DiscourseRepresentation Theory (DRT) [4]. DRT is a model-theoretic approach todiscourse semantics which describes the interpretation of discourses as adynamic two-level process. Hence, since our inventory of symbols isprimary motivated by phonological analysis and since the domain ofprocessing these symbols is DRT, the criterion for describing fundamentalfrequency contours which is emphasized in our system is that only thoseintonational events should be labeled which are distinctive in the sensethat one can assign them a function in the domain of discourseinterpretation. A further consequence is, that the system reflectsphonological pitch-accent linking rules which introduce a set of'allotonic' accents. Beyond that a small set of default symbols enrichesthe standard ToBI notation. These default labels are filled in withphonetic content according to autosegmental spreading and alignmentconventions.


2 Tone tier

2.1 Pitch Accents

2.1.1 Inventory

  • Standard accents
    • H*L fall
    • L*H rise
    • HH*L "early peak"
    • L*HL rise-fall/"late peak"
    • H*M stylized contour
  • Linking
    • H* high target on accented syllable
    • L* low target on accented syllable
    • ..H high trail tone
    • ..L low trail tone
  • Downstep
    • !H*L fall
  • Uncertainty
    • ? Uncertainty: accent type
    • *? Uncertainty: accentuation

2.1.2 Standard accents

There are two basic pitch accents H*L and L*H.

H*L
a high target on the accented/tonic syllable followed bya falling pitch. If the accented syllable is the last syllable of anintonation unit (intonation phrase/intermediate phrase) the high targetand the fall are realized on one syllable, namely the accented syllable.If there are syllables following the accented one within the sameintonation unit, the high target is reached on the accented syllablefollowed by the first part of the fall which is continued on the nextsyllable. After H*L the F0 contour runs in the lower third of thespeaker`s range parallel to the baseline until just before the nextaccented syllable or a phrase boundary.


Fig. 1 Example for H*L; utterance den hast du nicht; thevertical line indicates the onset of the accented syllable nicht which isthe last syllable of the phrase.

 

L*H
a low target on the accented syllable followed by arising pitch. If the accented syllable is the last syllable of anintonation unit (intonation phrase/intermediate phrase) the low targetand the rise are realized on one syllable, namely the accented syllable.If there are syllables following the accented one within the sameintonation unit, the low target is reached on the accented syllablefollowed by the first part of the rise which is continued on the nextsyllable. After L*H the F0 contour runs in the upper third of thespeaker`s range (mostly parallel to the baseline) until just before thenext accented syllable or a phrase boundary.


Fig. 2 Example for L*H followed by H*L; utterance die PensionBerlin ist doch links; vertical lines indicate word boundaries


Fig. 3 Example for L*H; utterance Wohnwagen; the verticalline indicates the boundary between Wohn (the accented syllable) andwagen.


Besides the two basic accents there are three additional 'special` pitchaccents (cf. [1], ch. 3).

L*HL
(rise-fall/"late peak") a low target on the accented syllablefollowed by a rise and a fall to a low level. If the accented syllable isfollowed by at least two additional syllables within the sameintonational unit the low target (the stared tone) is realized on theaccented syllable. The rise is expected to start on the accented syllableand to be continued on the next syllable. The fall starts on the firstsyllable after the accented one and ends somewhere in the secondsyllable. If the accented syllable is followed by only one syllablewithin the same intonation unit the low target should be on the accentedsyllable, followed by a rise, which is realized patially on the accentedsyllable and partially on the postaccentual syllable and a fall on thepostaccentual syllable. In the third possible case - the accentedsyllable is the last syllable in its intonation unit - both the riseand the fall are realized on one syllable starting at a low target at thebeginning of the accented syllable.


Fig. 4 Example for L*HL; utterance wieso links; verticalline indicates word boundary.

 

HH*L
("early peak") a high target on the preaccentual syllablefollowed by a fall on the accented syllable. This accent type must berealized on at least two syllables: an accented syllable and apreaccentual syllable, which must be weak (i.e. not stressable). If thereis no syllable following the accented one the fall is realized on theaccented syllable. If there is a postaccentual syllable the fall isdevided into two parts, one part on the accented syllable and the otherpart on the postaccentual syllable. In any case a high target on thepreaccentual syllable must precede the fall. There can be a downsteppedtarget on the accented syllable but in most cases it is a plain fallstarting at the height of the preaccentual pitch (or even lower but inthe upper third of the speakers range).


Fig. 5 Example for HH*L; utterance hab ich mir schongedacht; the first vertical line indicates the beginning of ge, thesecond line the beginning of dacht; a high target is reached on thesyllable ge (prefix), the fall is realized on dacht.


The last accent type which is covered by our labelling system is veryrestricted. It is designated for describing the so called stylizedcontours which are often used in vocatives (cf. [1], ch. 3.2.1).

H*M
('M' encodes a tone in the middle of the speakers range)a high target on the accented syllable followed by a high level tonefollowed by a fall to the middle of the speakers range on the lastsyllable of the intonation unit. Words bearing this accent type arenormally final in their intonation unit. H*M is the only accent type witha stared tone that can undergo spreading [1]. This means that if thereare postaccentual syllables these syllables are associated with the Hinstead of the trail tone M and only the last postaccentual syllable inthe intonation unit is associated with M. This results in the high leveltone realized on the syllables between the accented syllable and the lastsyllable. If there is only one sylllable to bear H*M the nucleus of thissyllable is often duplicated, so that a high target can be realized onits first part and on its second part a mid target is realized.

Another (non-tonal) feature that may serve as an identifier of stylizedcontours is the considerable lengthening of the accented syllable and allpostaccentual syllables until the boundary of the intonation phrase isreached.


Fig. 6 Example for H*M; utterance Angelika!; vertical linesindicate syllable boundaries; note the high level tone beginning on theaccented syllable ge and continued on the postaccentual syllable li; thetrail tone M is realized on the last syllable of the intonation phraseka.


Fig. 7 Example for H*M; utterance Joerg!; the vertical lineindicates the boundary between the first half and the second half of thenuclear vowel.


2.1.3 Linking

Linking rules (cf. [1]) permit that prenuclear pitch accents can splitoff their trail tone which is then either associated with the syllablebefore the next accented syllable (i.e. partial linking) or evencompletely omitted (i.e. complete linking). The application of theserules is dependend on speaking style, speaking rate, etc., and is claimedto be invariant to discourse interpretation. That is the linked/omittedtrail tone will not change the meaning of the intonational contour. Thelinking processes are thus allotonic. The reason for introducing symbolsthat reflect these rules is to enable the labeler to annotate theapplication of such a rule so that there is not too much discrepancybetween the labels and the actual contour.

 

H*
a high target on the accented syllable that is notfollowed by an immediate fall. The course of the F0 contour after thehigh target depends on the type of linking. In case of partial linkingthe contour should roughly be interpolated between the stared tone (thehigh target) and the partial linked trail tone L, which is associatedwith the syllable just before the next accented syllable. This results ina smooth fall starting on the accented or the postaccentual syllable andending on the next preaccentual syllable. In case of complete linkingthe F0 course depends on the next accent. The contour should look roughlylike an interpolation between the stared tone of the linked accent (H)and the stared tone of the next accent (H or L). So if H* is followed byfor example H*L F0 between the accented sylables runs in the upper thirdof the speakers range, if H* is followed by L*H the contour falls betweenthe two accented syllables.


Fig. 8 Example for H* ... H*L (underlying H*L ... H*L); utterance...desto unruhiger wurden die Leute; the contour between the twoaccented syllables is high but slightly falling due to declination.(There are some instances of strong laryngalization: on desto the F0algorithm fails, on the second syllable of Leute no pitch is detected atall.)


Fig. 9 Example for H* ... L*H (underlying H*L ... L*H); utterance...aber zuckte resigniert mit den... .

 

..L
partial linking: the low trail tone of an underlying H*Laccent; to be assigned to the next preaccentual syllable following thelinked accent.


Fig. 10 Example for H* ..LH*L (underlying H*L ... H*L); utteranceden hattest du das letzte mal auch nicht; note the laryngalization atthe beginning of auch.


Fig. 11 Example of H* ..LH*L; utterance am Samstag denzehnten... .

 

L*
a low target on the accented syllable that is notfollowed by an immediate rise. The course of the F0 contour after the lowtarget depends on the type of linking. In case of partial linking thecontour should roughly be interpolated between the stared tone (the lowtarget) and the partial linked trail tone H, which is associated with thesyllable just before the next accented syllable. This results in a smoothrise starting on the accented or the postaccentual syllable and ending onthe next preaccentual syllable. In case of complete linking the F0course depends on the next accent. The contour should look roughly likean interpolation between the stared tone of the linked accent (L) and thestared tone of the next accent (H or L). So if L* is followed by forexample L*H F0 between the accented sylables runs in the lower third ofthe speakers range, if L* is followed by H*L the contour rises betweenthe two accented syllables.

 

..H
partial linking: the high trail tone of an underlying L*Haccent; to be assigned to the next preaccentual syllable following thelinked accent.

 

2.1.4 Downstep

For downsteped falling accents the system provides the symbol !H*L.(Downstep should be labeled though at present it is not sure whether itdiffers in meaning from H*L.)

 

!H*L
see H*L; the !H*L accent must be preceded by at least onepitch accent with a stared H tone (H*L, HH*L, H*). The high target of!H*L is still high compared with the surrounding lows but it is lowercompared with a preceding high target.


Fig. 12 Example of !H*L; utterance als Lebensmittel nichtgenuegend vorhanden waren; three falls in sequence, one not downsteped(on nicht) and one downsteped (on vorhanden).

 

2.1.5 Uncertainty

The system provides two symbols for different levels of uncertainty.

?
is a diacritic that may be applied to each accent typedescribed above (it may also be applied to boundary tones; see below). ?should be added to a standard symbol if the presence of an accent(boundary) is clear but the accent (boundary) type is questionable.

 

*?
is used in cases, when the transcriber is uncertain evenif there is a pitch accent at all.

 

2.1.6 Alignment

All pitch accents and the *? symbol are labeled approximately in themiddle of the nucleus of the accented syllable (HH*L is also labeled inthe nucleus of the accented syllable). Linked trail tones are labeled inthe middle of the nucleus of the preaccentual syllable.


2.2 Boundary tones

Prosodic phrasing of utterances is described on two levels, intermediatephrases (ip) and intonation phrases (IP). These levels are orderedhierarchically: an intonation phrase contains at least one intermediatephrase, an ip contains at least one pitch accent.

 

2.2.1 Intermediate phrase boundary

There is only one default label to transcribe the presence of anintermediate phrase boundary.

 

-
(hyphen) indicates an intermediate phrase boundary(combined with ToBI break indices 2 or 3 / VM break index B2); ipboundaries at the end of an IP are not indicated, in this case only theIP boundary tone is labeled, the ip boundary is subsumed.

 

2.2.2 Intonation phrase boundary

For transcribing boundary tones at the end of intonation phrases threelabels are provided.

 

H%
high boundary tone. To transcribe a clear rise on thelast syllable of an intonation phrase. If the last syllable of an IP isaccented both the pitch accent and the boundary tone are realized on thesame syllable: if the last syllable is assigned H*L and the IP isassigned H% there should be a fall followed by a rise at the end of theIP. If the last syllable is assigned L*H and the IP is assigned H% thereis just a rise at the end of the IP. If the nuclear accent (i.e. thelast accent) is not phrase final two possibilities appear: if the trailtone of the nuclear accent is L the contour runs in the lower third ofthe speakers range and rises on the last syllable. If the trail tone is Hthe contour runs in the upper third and there should be an additionalrise on the last syllable.


Fig. 13 Example for H*L H%; utterance ...auf einem Schild schonlesen koennen; rise on the last syllable which is not an accent.


Fig. 14 Example for L*HL H%; utterance ...dass diese Draengeleifuer ihn vielleicht ein Vergnuegen sei.


Fig. 15 Example for L*H H%; utterance ...ist schraeg nach rechtsoben. Note the additional rise on the last (unaccented) syllable.

 

L%
low boundary tone. To transcribe a fall on the lastsyllable of an intonation phrase. If the last syllable of an IP isaccented both the pitch accent and the boundary tone are realized on thesame syllable. But since the German intonation system allows only fallingnuclear accents previous to a low boundary tone this makes not muchdifference. If the nuclear accent (only falling accents in front of L%)is not phrase final F0 is low until the end of the phrase with anadditional fall on the last syllable.


Fig. 16 Example for H*L L%; utterance ...und der Mann konntejetzt von allen Seiten Schimpfwoerter hoeren. (jitter at the end isdue to laryngalization and breathyness.)

 

%
default boundary tone for transcription of IP boundarieswithout or with only slight tonal movement. Interpretation by spreadingof the nuclear accent's trail tone. IP ends with a high or low plateau,dependend on the previous trail tone, without considerable tonal movementon the last syllable.


Fig. 17 Example for L*H %; utterance wer spaeter kam, musstesich... .

 

2.2.3 Initial IP boundary

By default, pitch at the beginning of intonation phrases is expected tostart in a low or middle part of the speakers range which is consideredas being not a (initial) boundary tone and therefor is not trancribed.Only if pitch starts decidedly high and only when there is no otherplausible explanation for an initial high pitch (e.g. a H accent on thefirst few syllables) the following label is used:

 

%H
phrase initial high pitch

For cases of disfluency there is an additional symbol (which should beused 'conservatively', i.e. not too frequently).

 

%r
for restarted new intonation contours when the lastcontour was interrupted without being finished due to somedisfluency.

 

2.2.4 Alignment

The phrase final boundary tones are labeled at the very end of the phrase(ip or IP), i.e. at the end of the last syllable of the phrase. (Sophrase final the boundary tone label, the transliteration label, and thebreak index label should have the same alignment point.) The initialboundary tone %H and the %r symbol are labeled at the beginning of thephrase, in front of the first syllable of the IP.


 

3 Break indices

On the 'nature' of the break index tier:
"Break idices represent a rating for the degree of juncture perceivedbetween each pair of words and between the final word and the silence atthe end of the utterance. They are to be marked after all words that havebeen transcribed in the orthographic tier. All junctures - includingthose after fragments and filled pauses - must be assigned an explicitbreak index value; there is no default juncture type." ([2]: 31)

Two equal valued models can be used to transcribe pauses in the Stuttgartsystem: ToBI standard [2] and the Verbmobil conventions[5].

3.1 ToBI conventions (cf. [2]: 31-38)

The five basic break indices are the following:

0
for word boundaries in clitic groups


1
for 'normal' phrase-medial word boundaries


2
for mismatch between tonal marks and disjuncture marks:
" a strong disjuncture marked by a pause or virtual pause, but with notonal marks; i.e. a well-formed tune continues across the juncture
OR
a disjuncture that is weaker than expected at what is tonally a clearintermediate of full intonation phrase boundary
" ([2]: 35).
3
indicates typical boundary strenght at intermediatephrase boundaries


4
indicates typical boundary strenght at intonation phraseboundaries


The indices 1, 2, and 3 may also be combined with the diacritic p:

1p
"an abrupt cutoff before an actual repair, or as ifstopping to permit a repair or restart of some kind" ([2]: 36)


2p
"a hesitation pause or prolongation of segmental materialwhere there is no phrase accent [ = ip boundary tone; JM] perceived inthe intonation contour" ([2]: 36)


3p
"a hesitation pause or a pause-like prolongation wherethere is a phrase accent in the tone tier" ([2]: 36)

 

3.2 Verbmobil conventions (cf. [5]: 3-5)

 

(B0
option for word boundaries in clitic groups; not used bynow)

 

B1
for 'normal' phrase-medial word boundaries

 

B2
indicates typical boundary strenght at intermediatephrase boundaries

 

B3
indicates typical boundary strenght at intonation phraseboundaries

 

B9
for hesitations, prolongations, cutoffs; comparable withthe p diacritic in ToBI.

 

3.3 Alignment

Break indices are labeled at the end of the word before the break, i.e.they are aligned with the orthographic labels and - if there areany - with the tonal boundary labels.

 

4 Orthographic transcription

In the orthographic tier the words of the utterance are simplytranscribed orthographically. The labels should be positioned exactly atthe end of the actually trancribed word. For different reasons we haveadded the following symbols:

 

<P>
for pauses; to be labeled at the end of thepause.

 

<slip/disfl>
for slips of the tongue and otherdisfluencies


 

5 Miscellaneous

According to the ToBI standard [2] the miscellaneous tier is designatedfor transcribing different types of disfluencies. In addition, theStuttgart System provides some labels to mark phonation type/voicequality.

 

5.1 Inventory

 

silence
(sil) for silent pauses

 

breath
for audible breathing

 

laugh
for laughing

 

cough
for coughing

 

disfluent
(disfl) other disfluencies

 

laryngalization
(laryng) for instances of laryngalization

 

creaky_voice
(crk_voc) for creaky voiced sections

 

breathy_voice
(brth_voc) for breathy voiced sections

 

harsh_voice
(harsh_voc) for harsh voiced sections

 

whispery_voice
(whisp_voc) for whispered sections

 

5.2 Alignment

Unlike events in the tone tier, the break index tier, or the orthographictier all events in the miscellaneous tier must be marked for both theirends and their beginnings because there is no strict succession ofmiscellaneous events (normally only few intervalls of the actuallylabeled signal will be marked in the miscellaneous tier whereas in theother tiers the whole signal is successively transcribed). Two diacritics(both suffixes) serve to indicate beginning or end of the transcribedevent:

 

<
beginning

 

>
end

 

For example
silence< ... silence>

(The miscellaneous labels should be used rather as rough pointers to thetranscribed event than as precise demarcations.)


 

6 References

[1] Caroline Fery (1993), German intonationalpattern. Tuebingen: Niemeyer.

[2] Mary E. Beckman & Gayle M. Ayers (1994), Guidelines for ToBIlabelling. Version 2.0, February 1994.

[3] Julia Hirschberg & Mary E. Beckman (1994), The ToBI annotationconventions.

[4] Hans Kamp & Uwe Reyle (1993), From discourse to logic. Dordrecht:Kluwer Academic Publishers.

[5] Matthias Reyelt & Anton Batliner (1994), Ein Inventar prosodischerEtiketten fuer VERBMOBIL. Verbmobil-Memo-33-94, Juli 1994.

[6] Martine Grice & Ralf Benzmueller (1994), Transcription of Germanusing ToBI-tones - the Saarbruecken System. Ms., University ofSaarbruecken.

[7] Joerg Mayer (1995), Transcription of German intonation - theStuttgart System. Ms., University of Stuttgart.


 

7 Data reference

Data shown in figures 1, 2, 3, 4, 5, 10, 15: Saarbruecken MapTask Corpus
Data shown in figures 8, 9, 12, 13, 14, 16, 17: The Kiel Corpusof Read Speech, Vol. 1
Data shown in figure 11: German Verbmobil Corpus

jm