6. Layer 3: Phonological Representation of Intonation - ToBI scheme

6.1 Markup Declaration

Within the ToBI system [Silverman et al., 1992], the Tone Tier is the level used to transcribe intonation phenomena. The types of phenomena covered by ToBI are tones and dowstepping, in their definition by Pierrehumbert (1980). F0 range and peak delay are also considered. The system is mainly phonological, labeling intonation events according to their function as described in language-dependent intonation models, with explicit reference to prosodic units such as prominent syllables, words and prosodic phrases. Nevertheless, some direct reference to acoustic values is admitted: while intonation events are supposed to be associated with linguistic units (syllables, words) they may also be aligned to specific points in the raw f0 curve, possibly corresponding to the tone 'peak'. Such alignment may be given for each tone or, alternatively, only for those whose peak occur too 'early' or too 'late' with respect to the stressed syllable. Moreover, a special marker may indicate the highest point in the f0 curve, to give an idea of the pitch range. These acoustic references look spurious in phonological annotation and are required only when a true acoustic-phonetic representation of the curve is lacking. In the MATE meta-scheme, such intermediate level is present, so that it could be profitably used instead of the direct references to f0 points.

In our XML adaptation of ToBI, four elements have been defined:

  • <tobitone>, for the tones, distinguished according to their function as pitch accents, phrase accents or boundary tones and labeled according to a classification of their linguistically admissible types
  • <target>, to mark peak location when it occurs outside the scope of the accented syllable
  • <f0range>, to mark the highest f0 value in the curve
  • <repair>, to mark the restart of the intonation contour after a disfluency
  • The four elements are not hierarchically ordered. All may refer to the f0 curve, while only the two accessory element <target> and <f0range> are necessarily linked to <f0>. The <tobitone> and <repair> elements can be linked to prosodic units and/or to phonetic descriptions of intonation, rather than raw f0. This kind of reference is recommended. The other two elements are provided for completeness with respect to ToBI, but may be avoided.
     

    6.2 The <tobitone> element

    6.2.1 Description

    The <tobitone> element has been defined to adapt to XML format the ToBI labels defined in the Tone Tier for the description of tones. In this framework, a tone is a functionally simple prosodic event which may be phonetically complex, e.g it may consist in an accent realized by reaching a low target f0 and immediately rising to a high f0 value. In fact, while the base descriptive elements are pitch levels H (high) and L (low), a tone can in some cases be described by a combination of levels, which amounts to describing it as a movement. ToBI notation presupposes a classification of the different types of tones admissible in a given language, so it is model and language dependent. In the following, the reference for the inventory of tones is the original ToBI model proposed for American English (Beckman & Ayers, 1994).

    Within the ToBI framework, two types of tones are considered:

    a) phrasal tones: pitch events associated with intonational boundaries.

    b) pitch accents: pitch events associated with accented syllables.

    Phrasal tones could be further distinguished into:
    a.a) phrase accents, events at intermediate phrase boundaries

    a.b) boundary tones, events at full intonation phrase boundaries

    Note that, in the prosodic structure, an intonation phrase is a sequence of intermediate phrases, so it will be marked both by the phrase accent of its last intermediate phrase and by the boundary tone.

    In our ToBI adaptation, we will use the <tobitone> element for all these classes of tones and distinguish them with the attribute class, which may assume the values pitch accent ("pitaccent"), phrase accent ("phraccent") or boundary tone ("boundtone"). For each class, a set of different tone types is defined, represented as values of the type attribute.

    For a more detailed description of these symbols, the user is referred to Price (1992), Silverman et al. (1992), Beckman & Ayers (1994) and Pitrelli et al. (1994).
     

    6.2.1.1 Pitch accents

    The inventory of pitch accents considered in ToBI is the following:
     

    H* ‘peak accent’, an apparent tone target on the accented syllable which is in the upper part of the speaker’s pitch range for the phrase.
    L* ‘low accent’, an apparent tone target on the accented syllable which is in the lowest part of the speaker’s pitch range
    L*+H ‘scooped accent’, a low tone target on the accented syllable which is immediatly followed by relatively sharp rise to a peak in the upper part of the speaker’s pitch range
    L+H* ‘rising peak accent’, a high peak target on the accented syllable which is immediatly preceded by relatively sharp rise from a valley in the lowest part of the speaker’s pitch range
    H+!H* a clear step down onto the accented syllable from a high pitch which itself cannot be accounted for by a H phrasal tone ending the preceding phrase or by a preceding H pitch accent in the same phrase

    6.2.1.2 Phrase accents

    The different types of phrase accents considered in ToBI are:
     

    L- Low phrase accent, which occurs at an intermediate phrase boundary
    H- High phrase accent, which occurs at an intermediate phrase boundary
    !H- Downstepped high phrase accent

    6.2.1.3 Boundary tones

    The different types of boundary tones are:
     

    L% Low (final) boundary tone, which occurs at every full intonation phrase boundary
    H% High (final) boundary tone, which occurs at every full intonation phrase boundary
    %H Initial boundary tone; marks a phrase that begins relatively high in the speaker’s pitch range; the default initial boundary is in the middle of the range or lower, and will be left unmarked in the transcription

    6.2.1.4 Uncertainty

    ToBI has a way of dealing with uncertainty, by using one or several of the following symbols, that we will consider as special cases of <tobitone> elements:
     

    * The pitch accent has not been transcribed yet
    *? Uncertainty about the presence of a pitch accent
    X*? Uncertainty about the type of pitch accent
    - The phrase accent has not been transcribed yet
    -? Uncertainty about the presence of a phrase accent
    X-? Uncertainty about the type of phrase accent
    % The boundary tone has not been transcribed yet
    %? Uncertainty about the presence of a boundary tone
    X%? Uncertainty about the type of boundary tone

    6.2.1.5 'Phrase accent' + 'boundary tone' combinations

    As full intonation phrase boundaries will always have two final tones, a phrase accent tone plus a boundary tone, the possible set of allowable combinations at the end of an intonation unit is the following:
     

    L-L% a full intonation phrase with an L phrase accent ending its final intermediate phrase and an L% boundary tone falling to a point low in the speaker’s range

    (standard ‘declarative’ contour of American English).

    L-H% a full intonation phrase with an L phrase accent closing the last intermediate phrase, followed by an H boundary tone (‘continuation rise’)
    H-H% an intonation phrase with a final intermediate phrase ending in an H phrase accent and a subsequent H boundary tone (‘yes-no questions’)
    H-L% an intonation phrase in which the H phrase accent of the final intermediate phrase upsteps the L% to a value in the middle of the speaker’s range (final level ‘plateau’)

    6.2.2 Data Source

    A ToBI transcription is usually carried out taking the raw f0 as basic representation. Alternatively, it can rely on a phonetic representation of the f0 curve. In any case it should be aligned with linguistic units: phones or syllables or words or phrases or all of them.

    6.2.3 Segmentation/selection

    Tones are identified by inspecting the intonation curve (raw or stylized) and the aligned syllables and prosodic phrases. Depending on the annotation purposes, tones may be linked to points in the f0 curve or to linguistic units. ToBI annotation is not intended as a segmentation of the intonation curve, rather it is a selection of its relevant events, driven by the underlying linguistic structure of the utterance. ToBI tones are originally intended as associated to syllables, stressed syllables or phrase-final syllables, with some loose suggestion as to their precise alignement with the f0 curve: the 'peak' or 'valley' of the tone is intended to occur in the scope of the associated syllable, unless otherwise specified by the symbols ">" and "<" (see the <target> element in par. 6.3.1). In current implementations of the system, such as the one in the ESPS-Waves+ environment (‘http://www.ling.ohio-state.edu/phonetics/ToBI/ToBI.0.html’, ‘http://www.entropic.com/products&services/esps/esps.html’), the link with the f0 curve is made explicit, with tones aligned with their target point in the f0 curve.

    Different options are available in MATE for <tobitone>'s alignment. A very simple one could be to have a display of the f0 raw curve aligned with <word>'s and define each <tobitone> by selecting the word on which it occurs. One could proceed in a similar way with <syllable>'s instead of words and select the syllable (or the <phone> sequence) to be associated with the tone.

    In a more sophisticated approach, one may build up <tobitone>'s starting from the stylized curve, thus giving a phonetic correlate to the phonological ToBI label. To this end one may rely on <pitmove>'s or <intone>'s. The latter are perhaps more suitable to be considered as components of a <tobitone>, because of their underlying pitch level or target point interpretation. Basing on the synchronized display of the <intone> stylized curve, the <phone> transcription and the prosodic phrasing (<breakindex>), a <tobitone> will be defined by selecting its target points on the stylized curve: e.g. a simple <tobitone> like H* will be associated with a single <intone>, a complex one with a sequence of <intone>'s. Time attributes will be inherited from the selected <intone>'s. Time alignment will always be available in order to find out (via query or window synchronization) the corresponding <syllable>.

    If an explicit association with f0 values is needed, one could code the f0 curve at level 2 in the <f0> element, and then associate <tobitone>'s with <f0> elements, or, alternatively, keep the link with <syllable> (or <word>) but set the time attribute to a time value corresponding to the f0 peak (that will then be retrievable via query).

    6.2.4 Assignment

    The attributes considered here for the <tobitone> element are the following:

  • type: one of the labels defined in ToBI for tonal transcription.
  • class: one of the class of tones defined in the ToBI system: "pitaccent" (pitch accent), "phraccent" (phrase accent), "boundtone" (boundary tone).
  • href: <f0> or <intsint> or <syllable> or <word>
  • start: time start of the tone (inherited)
  • end: time end of the tone (inherited)

  • 6.3 The <target> element

    6.3.1 Description

    This element is used to indicate the location of the f0 peak or valley of a pitch accent, when it does not coincide with the stressed syllable. In the original ToBI notation, such early or late target points are marked with the symbols ">" and "<", respectively.

    6.3.2 Data Source

    The raw or stylized f0 contour (in one of the following representations: <f0>, <closecopy>, <momel>, <intone>).

    6.3.3 Segmentation/selection

    The target position is located by visual inspection of the f0 contour. The corresponding <f0> (or stylized) element is selected and marked as ‘EarlyF0’, if it precedes the stressed syllable, or ‘LateF0’, if it follows it.

    6.3.4 Assignment

    The attributes considered here for the <target> element are the following:

  • type: "EarlyF0" or "LateF0"
  • href: <f0> or <closecopy> or <momel> or <intone>
  • start: time start of the f0 peak (inherited)
  • end: time end of the f0 peak (inherited)
  • 6.4 The <f0range> element

    6.4.1 Description

    This element has been included to represent the ‘f0 range’ annotation symbol, which is used to indicate the f0 maximum in the speaker’s range for a given phrase.

    6.4.2 Data Source

    The raw or stylized f0 contour (in one of the following representations: <f0>, <closecopy>, <momel>, <intone>).

    6.4.3 Segmentation/selection

    The location of the maximum of the f0 range position is determined by visual inspection of the f0 contour and the corresponding element (<f0> or <momel> or <closecopy> or <intone>) is selected to be associated with the <f0range> element.

    6.4.4 Assignment

    The attributes considered here for the <target> element are the following:

  • type: ToBI symbol for f0 maximum: "HiF0"
  • href: <f0> or <closecopy> or <momel> or <intone>
  • start: time start of the f0 peak (inherited)
  • end: time end of the f0 peak (inherited)
  • 6.5 The <repair> element

    6.5.1 Description

    This element has been included to represent the ‘repair’ annotation symbol "%r", defined in ToBI for the restart of an intonation contour when the last contour was interrupted without being finished by some disfluency. Such restart can be considered an intonation event aligned with a specific point in the f0 curve or with the corresponding prosodic unit.

    6.5.2 Data Source

    The raw or stylized f0 contour (in one of the following representations: <f0>, <closecopy>, <momel>, <intone>) and the phonetic transcription, with <phone> and <syllable> elements.

    6.5.3 Segmentation/selection

    Both listening and inspection of the f0 curve are necessary, aligned with phonetic transcription.

    As in the case of <tobitone>, two main options are available: 1) select the <syllable> element on which the intonation restart occurs, 2) select an element in a phonetic representation of intonation: <f0>, <intone>, etc.

    6.5.4 Assignment

    The attributes considered here for the <repair> attribute are the following:

  • type: ToBI symbol for repair ("%r")
  • href: <f0> or <closecopy> or <momel> or <intone> or <syllable>
  • start: time start of the f0 peak (inherited)
  • end: time end of the f0 peak (inherited)

  •  6.5 Example

    The following example shows the ToBI annotation of the English utterance "Show me the cheapest fare from Philadelphia to Dallas excluding restriction VU slash one" (obtained from the TOBI-TRAINING material), using the elements <tobitone> and <repair>. Note that in this case tones are linked (by means of the 'href' attribute) to <word> elements, in addition to time values.
     
     

    tobitone.xml
    <tobitone id="tbtn_001" type="H*"   class="pitaccent" href="word.xml#id(wrd_001)" start="2052" end="2052"/>
    <tobitone id="tbtn_002" type="L+H*" class="pitaccent" href="word.xml#id(wrd_004)" start="2579" end="2579"/>
    <tobitone id="tbtn_003" type="!H*"  class="pitaccent" href="word.xml#id(wrd_005)" start="3065" end="3065"/>
    <tobitone id="tbtn_004" type="L-"   class="phraccent" href="word.xml#id(wrd_005)" start="3315" end="3315"/>
    <tobitone id="tbtn_005" type="L%"   class="boundtone" href="word.xml#id(wrd_005)" start="3315" end="3315"/>
    <tobitone id="tbtn_006" type="L+H*" class="pitaccent" href="word.xml#id(wrd_009)" start="4470" end="4470"/>
    <tobitone id="tbtn_007" type="!H*"  class="pitaccent" href="word.xml#id(wrd_009)" start="4771" end="4771"/>
    <tobitone id="tbtn_008" type="L-"   class="phraccent" href="word.xml#id(wrd_009)" start="5015" end="5015"/>
    <tobitone id="tbtn_009" type="H*"   class="pitaccent" href="word.xml#id(wrd_011)" start="5388" end="5388"/>
    <tobitone id="tbtn_010" type="L-"   class="phraccent" href="word.xml#id(wrd_011)" start="5855" end="5855"/>
    <tobitone id="tbtn_011" type="L%"   class="boundtone" href="word.xml#id(wrd_011)" start="5855" end="5855"/>
    <tobitone id="tbtn_012" type="L+H*" class="pitaccent" href="word.xml#id(wrd_012)" start="6984" end="6984"/>
    <tobitone id="tbtn_013" type="L-"   class="phraccent" href="word.xml#id(wrd_012)" start="7399" end="7399"/>
    <tobitone id="tbtn_014" type="L%"   class="boundtone" href="word.xml#id(wrd_012)" start="7399" end="7399"/>
    <tobitone id="tbtn_015" type="H*"   class="pitaccent" href="word.xml#id(wrd_013)" start="8154" start="8154"/>
    <tobitone id="tbtn_016" type="L-"   class="phraccent" href="word.xml#id(wrd_013)" start="8585" end="8585"/>
    <tobitone id="tbtn_017" type="L%"   class="boundtone" href="word.xml#id(wrd_013)" start="8585" end="8585"/>
    <tobitone id="tbtn_018" type="H*"   class="pitaccent" href="word.xml#id(wrd_014)" start="8711" end="8711"/>
    <tobitone id="tbtn_019" type="!H*"  class="pitaccent" href="word.xml#id(wrd_015)" start="8928" end="8928"/>
    <tobitone id="tbtn_020" type="L-"   class="phraccent" href="word.xml#id(wrd_015)" start="9114" end="9114"/>
    <tobitone id="tbtn_021" type="H*"   class="pitaccent" href="word.xml#id(wrd_016)" start="9353" end="9353"/>
    <tobitone id="tbtn_022" type="H*"   class="pitaccent" href="word.xml#id(wrd_017)" start="9694" end="9694"/>
    <tobitone id="tbtn_023" type="L-"   class="phraccent" href="word.xml#id(wrd_017)" start="9880" end="9880"/>
    <tobitone id="tbtn_024" type="L%"   class="boundtone" href="word.xml#id(wrd_017)" start="9880" end="9880"/>

     
    repair.xml
    <repair id="rpr_001" type="%r" start="4149" end="4149"/>

     

    6.6 Coding Procedure

    Different procedures may be followed to obtain a ToBI annotation of intonation. As above mentioned, a simple procedure may look at the shape of the raw f0 curve and align <tones> to <words>. Alternatively, tones may be linked to stylized curves or linguistic units.

    A recommended procedure, in line with the multilevel integrated MATE approach, is the following (where <intone>'s could be replaced with <closecopy>'s):

    1) open the following synchronized windows: <intone> (with the <momel> graphical display of the stylized f0 curve), <phone>, <syllable>, <breakindex>

    2) look for stressed syllables in the <syllable> sequence, listen to the signal and inspect the corresponding f0 contour to judge if it is a pitch accent

    3) if it is, select its components (<intone> elements) and create a corresponding <tobitone> (that will inherit time attributes from <intone>'s); for each detected intonation event, select the <syllable> on which it occurs, create the corresponding (linked) <tobitone> or <repair> element and assign it the proper class attribute; time values will be inherited from <syllable> or set explicitely to be aligned with the f0 peak

    4) label it with class = pitaccent and type = the appropriate ToBI label

    5) (alternatively link the tone to the corresponding stressed <syllable>, or to the corresponding <word>)

    6) look for phrase boundaries in the <breakindex> stream, listen to the signal and inspect the corresponding f0 contour to recognize the type of phrasal accent

    7) if the <breakindex> value is 4 (i.e. it corresponds to a full intonation boundary) decompose the contour into phrase accent and boundary tone

    8) select the <intone> elements componing the phrase accent and create a corresponding <tobitone> element (that will inherit time attributes from <intone>'s) with class= phrase accent and type= appropriate ToBI label

    9) if there is a boundary tone (break index 4), select its <intone> element and create the corresponding <tobitone> with class= boundary tone and appropriate label

    10) (alternatively link the tone to the corresponding <syllable>)

    With the explicit link to the phonetic representation of f0, the <target> and <f0range> elements may be unnecessary. If desired, they may be introduced by selecting the corresponding <intone> element.

    6.7 Markup Table
     

    <tobitone>
    id [ASCII]
    type L-, H-, !H-, -, -?, X-?, L-L%, L-H%, H-H%, H-L%, %, %?, X%?, H*, !H*, L*, L*+H, L*+!H, L+H*, L+!H*, H+!H*, *, *?, X*?
    class pitaccent, phraccent, boundtone
    href <f0> or 
    <closecopy> or 
    <momel> or 
    <intone> or 
    <syllable> or 
    <word>
    start [FLOAT]
    end [FLOAT]

    The set of symbols defined for the attribute ‘type’ includes the allowable combination of pitch accents, phrase accents, boundary tones and/or uncertainty symbols, as defined in the ToBI guidelines.

    The value for the attribute 'type' should be consistent with the attribute 'class', according to the semantics of the different labels described in the tables in 6.2.1.
     

    <target>
    id [ASCII]
    type EarlyF0, LateF0
    href <f0> or 
    <closecopy> or 
    <momel> or 
    <intone>
    start [FLOAT]
    end [FLOAT]

     
     
    <f0range>
    id [ASCII]
    type HiF0
    href <f0> or 
    <closecopy> or 
    <momel> or 
    <intone>
    start [FLOAT]
    end [FLOAT]

     
    <repair>
    id [ASCII]
    type %r
    href <f0> or 
    <closecopy> or 
    <momel> or 
    <intone> or 
    <syllable>
    start [FLOAT]
    end [FLOAT]



    7. Layer 4: Prosodic Phrasing - ToBI scheme