Relative Power of Cues: FO Shift Versus Voice Timing.

Number 507
Year 1985
Drawer 9
Entry Date 11/19/1999
Authors Abramson, Arthur S. & Lisker, Leigh
Contact
Publication In V. A. Fromkin (ed.). (pp. 25-31). Phonetic Linguistics: Essays in Honor of Peter Ladefoged.
url http://www.haskins.yale.edu/Reprints/HL0507.pdf
Abstract [Background] The acoustic features that provide information on the identify of phonetic segments are commonly called “cues to speech perception.” These cues do not typically have one-to-one relationships with phonetic distinctions. Indeed, research usually shows more than one cue to be pertinent to a distinction, although all such cues may not be equally important. Thus, if two cues, x and y, are relevant for a distinction, it may turn out that for any value x, a variation of y will effect a significant shift in listeners’ phonetic judgments but that there will be some values of y for which varying x will have negligible effect on phonetic judgments. We say, then, that y is the more powerful cue. A good deal of evidence now exists to show that the timing of the valvular action of the larynx relative to supraglottal articulation is widely used in languages to distinguish homorganic consonants. The detailed properties of the distinctions thus produced depend on glottal shape and concomitant laryngeal impedance of stoppage of airflow, as well as on the phonatory state of the vocal folds. Such acoustic, consequences as the presence or absence of audible glottal pulsing during consonant closures or constrictions, the turbulence called aspiration between consonant closures or constrictions, the turbulence called aspiration between consonant release and onset or resumption of pulsing, and damping of energy in the region or the first format have all been subsumed (Lisker & Abramson 1964, 1971) under a general mechanism of voice timing. In utterance-initial position, the phonetic environment in which consonant distinctions based on differences in the relative timing of laryngeal and supraglottal action have been most often studied, this phonetic dimension has commonly been referred to as voice onset time (VOT). Although the acoustic features just mentioned, and perhaps some others, may be said to vary under the control of the single mechanism of voice timing, it is course possible, by means of speech synthesis, to vary them one at a time learn which of them are perceptually more important. We must not forget, however, that such experimentation involves pitting against one another acoustic features that are not independently controlled by the human speaker. A relevant feature not yet mentioned is the fundamental frequency (Fo) of the voice. If we assume a certain Fo contour as shaped by the intonation or tone of the moment, there is a good correlation between the voicing state of an initial consonant and the Fo height and movement at the beginning of the a contour (House & Fairbanks 1953; but see also O’Shaughnessy 1979 for complications). After a voiced stop, Fo is likely to be lower and shift upward, while after a voiceless stop it will be higher amid shift downward (Lehiste & Peterson 1961). Although the phenomenon has been fully explained, it is at least apparent that is a function of physiological and aerodynamic factors associated with the voicing difference. The data derived from the acoustic analysis of natural speech can be matched by experiments with synthetic speech that demonstrate that Fo shifts can influence listeners’ judgments of consonant voicing (Fujimura 1971; Haggard, Ambler, & Callow 1970; Haggard, Summerfield, & Roberts 1981). Of further interest in this connection is the claim that phonemic tones have developed in certain language families through increased awareness of these voicing-induced Fo shifts and their consequent promotion to distinctive pitch features under independent control in production (Hombert, Ohala, & Ewan 1979; Maspero 1911). Our motivation for the present study was to put Fo into proper perspective as one of a set of potential cues to consonant voicing coordinated by laryngeal timing. After all, our own earlier synthesis (Abramson & 1970) yielded quite satisfactory voicing distinctions without Fo as variable. In addition, Haggard et al. (1970) may have exaggerated its importance in the perception of natural speech by their use of frequency range of 163 Hz, one very much greater than, for example, the range of less than 40 Hz found for English stop productions by Hombert (1975). We set out to test the hypothesis that the separate perceptual effect of Fo is small and dependent upon voice timing, while the dependence of the voice timing effect on Fo is virtually nil. We used native speakers of English as test subjects.
Notes

Search Publications