12. Proposed description of the EGG waveform.

In order to obtain a quantitative description of the signal, a model based on the form of the idealized waveform as proposed by Rothenberg (1983) and Baken (1992) was introduced (Marasek, 1995a, 1996) (Fig.22). The sharp rise of the waveform was accepted as the start of the pitch period. The idealized waveform has a flat peak (segment (b)), although there are variations in the conductance in the original signal which have a typically parabolic shape and can also fluctuate at random due to changes in capacitance which are not the result of vocal fold adduction (Esling, 1984). To reflect only the effect of the adduction a limit was set at 90% of the peak-to-peak amplitude. This threshold value was set following Esling's recommendation, but an additional investigation was conducted to confirm this empirical threshold value. The maximum contact phase (the closed phase in Fig.22) lasts between the intersections of the waveform and the 90% thresholds. The instant of the maximum amplitude (and its value), as well as the minimum are additionally stored.

Figure 22. The model of the EGG waveform with annotated vocal folds movement phases: a) Description of the EGG signal shape using 6 straight segments. Dotted line connects the closing phase markers. b) Timing of the EGG signal phases. Black thin lines show the original waveform.

The fall of the waveform is split into two phases at the opening instant. As stated by numerous authors (Childers & Krishnamurthy, 1985; Baken, 1992; Bear et al., 1983) the definition of this point in the EGG domain is very difficult due to this method's limitation and other disturbing effects (such as mucus bridging). Even when other methods of observing laryngeal behavior are used, it is difficult to precisely localize the instant of the opening (Koreman, 1996:54).

In the present work, various techniques for the determination of the opening instant in the EGG waveform were tested:

The quality of the estimation was evaluated using the recordings of two males and two females producing a sustained /a:/. The ratio of the measured mean value and the variance was examined using t-tests and then applied as a quality criterion. Several hundreds of EGG periods were used. In the experiment true and measured values were not compared, only the consistency of the results was checked. The results are summarized in Table 5. Generally, none of these methods yield acceptable results in the sense of statistical significance. Despite this fact, the baseline threshold crossing criterion was used in the subsequent experiments in favour of the smallest variance of the estimate8. This means that the dispersion of the results will be smaller and the results will be more consistent.

As a result, the first measure of the Open Quotient (OQI) is defined as:

The 90%-of-the-peak-to-peak-amplitude criterion used for the determination of the closed phase of the signal is reflected in the determination of the open phase (no-contact) of the EGG, since in this case the criterion of 10% of the peak-to-peak amplitude is used. If the signal level falls and remains below this threshold, full abduction of the folds is assumed. This leads to the other definition of the waveform duty ratio (the so-called Open Quotient II):

The estimate of the Open Quotient will of course be biased (too low values), but on the other hand, it can be expected that the variance of the measure will be smaller than for the OQI (deterimation of the opening instant is not necessary).

Due to the zippering and rolling action of the vocal folds it is anticipated that the closing of the upper and lower edge of the folds occurs with a time delay. In the EGG domain (according to Rothenberg's model) it manifests itself through a gradual increase in the conductance at the beginning of adduction and through a following sharp rise in the electrical current flow (when the second edge starts to close). This gradual rise is here modelled as the start of a closing phase (segment (f) in Fig.22).

Table 5: The results of the statistical analysis of the opening instant estimation using three methods: DERIV (minimum of first signal derivative), BASE (the same signal amplitude as at the start of the waveform period) and EQUAL (the crossing of the waveform with the line connecting the starts of neighbouring periods). Recordings of 2 modal female and 2 modal male voices producing a sustained vowel /a:/ were used.
             DERIV EQUAL BASE

N OF CASES   368 367 367

MEAN         115.016 115.853 112.278

VARIANCE     9138.294 9078.066 8643.338

STANDARD DEV 95.594 95.279 92.970

MEDIAN       105.500 104.000 103.000

TTEST DERIV EQUAL BASE

PAIRED SAMPLES T-TEST ON DERIV VS EQUAL WITH 367 CASES

MEAN DIFFERENCE = -0.523

SD DIFFERENCE = 77.131

T = -0.130 DF = 366 PROB = 0.897

PAIRED SAMPLES T-TEST ON DERIV VS BASE WITH 367 CASES

MEAN DIFFERENCE = 3.052

SD DIFFERENCE = 90.638

T = 0.645 DF = 366 PROB = 0.519

PAIRED SAMPLES T-TEST ON EQUAL VS BASE WITH 367 CASES

MEAN DIFFERENCE = 3.575

SD DIFFERENCE = 61.231

T = 1.118 DF = 366 PROB = 0.264

Each of the specified phases is modelled as a straight line. The slopes (as well as the constant values of the straights) and durations of segments are stored. For the horizontally defined segments the slopes are not computed. The durations are computed as relative to the pitch period duration. For every phase, the distance between the original waveform and the straight is stored additionally (more precisely: the area enclosed between both lines).

The Open and Speed Quotient (skewness) are also computed. The definition of the Speed Quotient is not directly related to the definition given for the glottal flow signal (section 7.1). In this study rise time is related only to fall time, as proposed by Esling (1984). Both quotients depend on the estimation of the opening instant which is, as stated above, prone to errors. In the proposed model, the durations of the rising and falling slopes of the EGG waveform can be defined in various manners because both slopes are divided into two segments. In the broader sense skewness (SQII) can be defined in the terms given in Fig.22

with t'e denoting te of the preceeding EGG period, and in the narrower sense (SQ) it can be defined as

Both ratios were extensively tested in the conducted experiments. The results are presented in subsequent chapters.

The proposed description of the EGG is based on thresholds. The threshold segmentation was confronted with an automated method which is insensitive to threshold variation. This approach achieves an optimal subdividision of the waveform period into 6 segments in the least square sense. A detailed description of this method is given below as well as the results of the comparison of both methods.

The algorithm used for EGG waveform segmentation was proposed by Allerhand (1987:107) and is called hierarchical linear regression.

For a given waveform of the length n the subdivison into k segments is conducted by clustering singletons (a chain of waveform samples) and simultaneously minimizing the cost function. The cost itself depends on the distance between the singleton which is actually examined and its neighbour and can be defined as the error of the least-squares fit of the regression line:

               (17)

where i is a number of a sample and yi is its value. The segments for which the sum (17) is the smallest, are merged. The cost function can include additional weights to specify segment prominence. The algorithm ends when the predefined number of segments or the given level of the cost function have been reached. The complexity of the algorithm can be substantially reduced by storing intermediate results. The adaptation of the algorithm to the waveform can result in a hierarchical structure of segments. At a given level of segmentation k the singeltons S1,..., Sn-k+1 are the most prominent segments. They are also the most linear segments over a range of waveform values and waveform curvatures (ibid.: 114).

The hierarchical regression method (HLR) was applied to confirm the threshold settings of the previously proposed signal subdivision. The segmentation of the EGG waveform was done twice: first, using the 10% and 90% criteria and then by hierarchical regression. The mean durations and slopes of selected EGG period phases (those of special importance) were compared using t-statistics. The pitch periods were determined for both methods using the algorithm previously introduced by Vieira et al. (1996). The detailed results are presented in Table 6.

Following the analysis described above, there is no difference between the methods as far as the measures of the slopes' steepness is concerned. This indirectly validates the use of threshold values for EGG waveform segmentation in order to capture the most important features of the signal (the phases are not substantially different). There is, however, a discrepancy in the durations of the individual phases (and in the duration-related parameters), but it is difficult to arrive at an appropriate physical interpretation of the phases defined using hierarchical linear regression. If the knee in the opening phase exists, the HLR method will find it and mark it as the opening instant. Moreover, it can be observed that for both methods the duration parameters undergo similiar changes across voice qualities.

An example of partitioning is given in Fig.23.
Figure 23. An example of EGG partitioning using thresholds (light grey line) and
hierarchical linear regression (dark grey). The original waveform is plotted in black.

     

The proposed description of the signal contains most of the important features of electroglottographic waveforms while remaining relatively simple. The simplicity was intended as one of the main aims of the description. It may also be directly related to the two-mass and body-cover models of vocal fold vibration which are used to validate the findings of the experimental part of this study. The presented model contains the minimal number of segments needed to distinguish the EGG signals of various phonation types (see section 13). The deviation of the modelled waveform from the original one reflects its irregularities (for example the notches observed for pathological voices by Motta et al., 1990) or the "roundings" in the waveform (as in Fig.21 for tense voice). The slopes of the contact phase may be suitable for investigations of the excitation strength, since they reflect the velocity of the closing and opening gestures and also comprise the features of the amplitude domain (Orlikoff, 1991; Marasek, 1996). It should be noted, however, that all measures are relative and only indirectly related to the actual movement of the folds as well as to the aeroacoustic properties of sound production. A careful validation of the description method will be included in the last part of this study.