Phonation types and glottal flow

7.5. Phonation types and the glottal flow signal.

It has already been mentioned that the phonation type strongly influences the properties of a generated sound. The differences are caused by a change in the excitation pulse. Thus, the glottal waveform is different for every individual phonation type.

Its specific characteristics depending on the respective phonation type are described in comparison to modal phonation, taking over the approach of Ní Chasaide and Gobl (1997), Stevens (1994), Trask (1996) and Zemlin (1988):

breathy voice:
- the fall phase is longer and the flow cut is more gradual
- the glottal pulse is more symmetrical
- high Open Quotient
- high peak flow glottal
- lower pitch
whispery voice:
- high OQ, but lower than for breathy voice
- pulses more skewed than for breathy voice, but more symmetrical than for normal voice
- a high peak glottal flow, but lower than for breathy voice, which implies that H1 is lower in the source spectrum
creaky voice :
- extremely low fundamental frequency
- irregular pulses
- low OQ and low glottal flow rate, resulting in weaker low frequency components, particularly H1
- impulses are relatively symmetrical, with a relatively short rise time which dampens the low frequency components;
falsetto :
- very high pitch
- rather low glottal peak flow
- often with glottis slightly open, thus, the effect of turbulent flow is observed in the spectrum
- pulses quite symmetrical.
tense voice:
- sharp cuting off of the glottal flow, boosting the higher spectrum components, very high skewness of the glottal pulse following a longer rise time
- small Open Quotient
- low frequency components of spectrum attenuated in comparison to the higher components
lax voice:
- comparable to breathy voice
- long rise time of the glottal pulse

7.6. Additional characterizations of pathological voice qualities.

The present descriptions of pathological voice involve both specific methods and common measurements also used for healthy speakers. A broad class of parameters is used to describe the "roughness" of the voice, its fluctuations in the temporal and amplitudinal domains. Especially the stability of the fundamental frequency was investigated. Various indicators were proposed in the literature (Koike, 1973; Kitaijma et al., 1975; Gubrynowicz et al.1980; Davis, 1978; Pinto & Titze, 1990; Titze & Liang, 1993; Baken, 1987). They include pitch perturbation factors over long and short periods of time (jitter), as well as an excess which describes the distribution of values (pitch period lengths) in relation to the normal distribution (Hays, 1988). Pathological voices are characterized by higher values for those parameters. Other types of parametrization include the autocorrelation function in order to characterize RMS fluctuation (shimmer) which is averaged over pitch periods (Davis, 1978). They also cover cepstral measurements (Gerull et al., 1992).

In the frequency domain "hoarseness" is investigated. "Hoarseness" is generally perceived as the level of noise in produced speech (see also section 4). It is often measured by means of a visual inspection of spectrograms or the use of a long-time averaged spectrum (LTAS) (Frokjaer-Jensen & Prytz, 1976; Gauffin & Sundberg, 1977, 1989). The latter technique distinguishes between certain types of voices in particular frequency bands. The spectral flatness of the residue signal (Markel & Gray, 1978) also demonstrates its dependence on the spectral noise level. The more sophisticated methods enable the researchers to compare the energy generated by voicing (harmonic component) and turbulent flow (friction or breath noise) (Teager Energy Operator, Gavidia-Ceballos et al., 1996; Cairns et al., 1996). The harmonic-to-noise ratio (Davis, 1978) is often used to characterize pathological voices.

It should be mentioned however, that the existing parametrization of pathological voices often fails in practice, primarily due to the high complexity and multidimensionality of pathological voice quality.

All methods presented above are either invasive or relatively complex. Hardly any of them allow an objective and robust description of the crucial parameters (the Open and Speed Quotients) of the glottal waveform which appear to correlate best with various types of voice quality.

A method which attempts to overcome these problems will be described in the following chapter.