4. Labelling of voice quality

phonemic non-phonemic + pathology settings

As in any phonetic description, the description of voice needs appropriate, unambiguous and distinctive labels. Abercrombie (1967) and later Laver (1991:173) use three strands, all simultaneously and continuously (permanently) present that describe the segmental features of voice, the features of voice quality and the features of voice dynamics. Thus, the description of voice involves the corresponding labelling of any of these strands. Laver distinguished impressionistic and phonetic labels of voice (Laver, ibid.) The former requires an audible demonstration of the type of voice referred to before the listener can construct an accurate interpretation of the label (for example "flat", "thin", "bird-like" or "velvety", or other so called "imitation labels"). The latter should be a part of a well-organized vocabulary and should have an exact and agreed upon definition which can be assigned to a label by a group of trained phoneticians. Phonetic labels of voices consist of sets of labels that cover all aspects of voice production, assuming standard anatomy and physiology. In fact, they act as instructions for achieving a certain articulation with a certain voice quality (e.g. loud, slow, nasalized, harsh, whispery, creaky, falsetto). Unfortunately, as of yet no standardized labelling system of voice quality exists, and phonetic labels are not mutually exclusive and sometimes ambiguous.

In the linguistic literature voice quality is generally looked at from two perspectives: is it phonemic or non-phonemic?

Phonemic voice quality has a contrastive function in the phonological system of a language. In most languages a contrast between segments is achieved on an articulatory basis rather than by different phonation types (defined in section 5.1), although for example breathiness is phonemic for vowels in Gujarati and for stops in Igbo (Ladefoged & Maddieson, 1996:47, 304). The languages using phonation contrasts are summarized in Table I.

Table 1: Examples of languages which use phonation types distinctively (after: Ladefoged & Maddieson, 1996)


contrastive phonation types

most languages

voiced vs. voiceless:

contrast among obstruents only


voiced vs. voiceless:

contrast between nasals1

Ik, Dafla, Amerindian languages of the Plains and Rockies, Bantu languages of the Congo basin, Indo-Iranian languages of the border region voiced vs. voiceless:

contrast between vowels

Gujarati, !Xóõ

modal vs. breathy voice:

contrast between vowels

Indo-Aryan languages

modal vs. breathy voice:

contrast between voiced stops


modal vs. stiff (slightly creaky) voice:

contrast between tones


slightly breathy vs. slightly stiff voice:

contrast between tones

Jalapa Mazatec

modal vs. breathy vs. creaky:

contrast between vowels


stiff vs. modal voice:

contrast for voiced stops


stiff vs. slack voice:

contrast for voiced stops

1Jessen & Pétursson (1997)

The changes in voice source behavior may be associated with segmental or suprasegmental elements on the linguistic layer of communication. Of the different phonation types (see section 5.1) modal, creaky (laryngealized), breathy and harsh (Nì Chasaide & Gobl, 1997:452) are used linguistically. It is rather striking that the tense/lax voice opposition (in the sense of the degree of overall muscular tension) is used linguistically (Maddieson & Ladefoged, 1985). In a segmental context voice quality is used contrastively for vowels and consonant in South African, South East Asian and native North American languages as shown in Table I (Ladefoged & Maddieson, 1996; Nì Chasaide & Gobl, 1997). Although the laryngeal differences are associated with voice quality distinctions between consonants, they are primarily located at the onset or offset of a vowel (e.g. in the breathy nasals of Tsonga the acoustic effects affect mostly the vowel onset; vocal fold abduction for the breathy voiced nasal begins during a nasal consonant (Ní Chasaide & Gobl, 1997:454). A suprasegmental property such as intonation, tone or stress also affects the production of voice. In this regard the respective characteristics are perceived to be dependent on the language used. Studies have shown that listeners with different native languages judge voice quality differently (Hurme & Sonninen, 1986). In other words, the judgements of voice quality are affected by a listener's phonological system (Lin 1995:18).

An interesting but still not researched function of voice quality is that it is perceived unconciously. Independently of what is said, it can be perceived as friendly, curious, vicious, off-putting etc. Helmholtz (1863) named this direct perception of emotions based on voice quality `unbewußtes Schließen'.

Another issue concerning voice quality is its contribution to what is commonly called pathological voice . As already mentioned above, the labelling of different voices is not unambigous and the perception of voice quality is not universal, as it depends on both cultural differences in general and the phonological system of a listener's native language. The description of pathological voice, however, attempts to be universal and is based primarily on more abstract laryngeal functions.

Among the various systems of pathological voice description the most common ones concentrate on the degree of "hoarseness" (Hirano, 1981; Nawka & Anders, 1996). Hoarseness is a term used to explain the perceived voice abnormality as originating at a voice source rather than resulting from abnormalities in vocal tract configuration and is perceptually related to the noise generation during phonation. The perception of voice abnormality through hoarseness can be graded, if we provide a detailed and language-independent description of a voice quality. Hirano (ibid.) proposes a scale of voice judgements which includes quantifiable perceptual dimensions related to a set of descriptive parameters for acoustic phenomena (Lin, 1995:20). The factors involved in the classification include:

Each of those labels can be graded from 0 to 3. This labelling system is known as the GRBAS classification (Isshiki &Takeuchi, 1970; Hirano, 1981, 1989).

It is widely used in the US and Japan. In Europe the labelling of asthenicity (A) has been criticized as highly correlated with breathiness. Also, the judgments of the tenseness of voice diverge considerably. For this reason a simpler system, the so called RBH system (Wendler et al., 1986; Nawka &Anders 1996:8), which is based only on three perceptual dimensions (roughness, breathiness and hoarseness) has come into use.

Listen to a voice graded to R3B2H3 (WAV file, 100 kB)

In Laver's (1991) framework it is possible to describe non-pathological voice qualities in a relatively objective manner.

Perceived voice quality can be described using phonetic settings (Table II). The settings are grouped into:

The description of a particular setting is usually given in terms of the degree of deviation from a neutral setting. The neutral setting is defined as a normal position relative to possible adjustments (Laver, 1991:186). Within this description voice quality is regarded as a superposition of a setting and an "organic component" which, to a wide extent, characterizes the baseline of the speaker's voice, i.e. its neutral setting.

Table 2: Phonetic settings of voice quality (from Laver 1991: 227)
   Supralaryngeal Settings     Laryngeal Settings
  Longitudinal axis:


labial protrusion



raised larynx

lowered larynx

  Simple phonation types:

modal voice




 Latitudinal axis settings:


close rounding

open rounding


lingual tip/blade

tip articulation

blade articulation

retroflex articulation









close jaw position

open jaw position

protruded jaw position

retracted jaw position

 compound phonation types:

whispery voice

whispery falsetto

creaky voice

creaky falsetto

whispery creak

whispery creaky voice

whispery creaky falsetto

breathy voice

harsh voice

harsh falsetto

harsh whispery voice

harsh whispery falsetto

harsh creaky voice

harsh creaky falsetto

harsh whispery creaky voice

harsh whispery creaky falsetto

 velopharyngeal settings:



 Overall muscular tension settings:

tense voice

lax voice