| Talking Heads: Speech Production |
|
Measuring and Analyzing Speech Production
A paradigm for speech research. Using the techniques described in the previous section, a multitude of articulatory studies has been conducted. Although they have addressed many different issues (see Levelt, 1989, for review), the invariance issue has been fundamental. Many studies have attempted to identify the articulatory characteristics of different phonemes using experimental designs that manipulate stress and speaking rate (e.g., Kuehn & Moll, 1976; Gay, 1981) or phonetic context (e.g., Sussman et al., 1973). Stress-rate studies have been used to identify articulatory attributes of the phoneme that are independent of the changes induced when non-phonetic factors are varied. Experiments that vary the phonetic context e.g., /aba, ibi, ubu/ are similar in that they allow the perturbing effects of the context to be distinguished from the inherent, and presumably stable, characteristics of the target phoneme (/b/ in this example). Another direction taken in the search for invariance has been to demonstrate that the articulators are flexibly, but task-specifically, coordinated in achieving specific phonetic goals. These studies used mechanically induced perturbations, either to reduce the number of articulators involved in a task e.g., bite-block and braking studies in which the mandible is positionally fixed (e.g., Lindblom et al., 1979; Folkins & Abbs, 1975) or to limit severely an articulators contribution, as in dynamic perturbation of the lips or jaw (e.g., Abbs & Gracco, 1983; Kelso et al., 1984; Gracco & Abbs, 1985; Saltzman et al., 1995). Normal production always involves multiple articulators, even if some of them do not move. Although the relative contributions of the two lips and the jaw in bilabial productions vary (e.g., the final /b/ in baeb), the result is roughly the same. Perturbing the system by removing the contribution of an articulator such as the jaw and examining the effects on the kinematics and physiology demonstrates the articulatory systems ability to flexibly and rapidly compensate for the perturbation without loss of intelligibility in the speech signal. Because the data channels (including the perturbation delivery signal) are synchronized, the timing of articulatory events at both neuromuscular and kinematic levels of observation can be precisely examined (e.g. Tuller et al., 1982, 1983). Factors such as stress and speaking rate have fairly consistent articulatory laryngeal and supralaryngeal correlates, however variability both within and across speakers is still quite high. One way to eliminate much of this variability is to focus on the relations among kinematic variables instead of on the individual measures (e.g., Ostry et al., 1983; Kelso et al., 1985). For example, the relation between an articulators peak velocity and movement amplitude is quite stable and linear across almost the entire range of its motion. At the same time, changes of stress and speaking rate are clearly marked by local changes of the slope of that relation. Borrowing heavily from the study of other biological movement systems such as limb motion, researchers have begun to model such patterns of behavior in terms of second-order mechanical systems, whose dynamic parameters e.g., mass, stiffness, and viscosity can be inferred from the relations among kinematic observables. Among other things, the approach promises the possibility of adducing stable (invariant) values of dynamic parameters from the variable kinematics values that can be compared across articulatory structures and many speaking contexts, including different languages (Vatikiotis-Bateson & Kelso, 1993). Figure 5 exemplifies a qualitative method for representing the continuous behavior and interrelation of position and velocity as trajectories in phase space (e.g. position vs. velocity). Much can be learned about the dynamic characteristics of a movement system simply by observing the shapes of its trajectories, or "phase portraits" (Abraham & Shaw, 1982, 1987). Furthermore, such qualitative assessment can be used to direct subsequent quantitative analysis. Thus, in the figure, even though gross effects of stress can be seen in the roughly alternating sequence of larger and smaller movements, it is quite difficult to interpret the individual waveforms for articulator position and instantaneous velocity (left side), much less any relation between them. But, when the two variables are plotted in phase space (right side), their continuous correlation can be seen in the stability of the trajectory shapes associated with the repetitive syllable sequence. Also, certain aspects of their variability become readily apparent. For example, the phase portraits show motion of the articulators to be less variable during production of the consonant (top half) than of the vowel (bottom half). It can also be seen that the correlation between velocity and position is different for the two articulators, particularly during the closing phase of the movement cycle (right half). The phase portraits show an even greater tendency for covariation of movement amplitude and peak velocity for the jaw alone than for the lower lip (whose motion includes that of the jaw). This mapping of observables onto phase-space representations also provides a means for considering distinctive differences in interarticulator timing. In Figure 6, for example, the time at which the upper lip begins to move toward closure for the second /b/ in bapab is expressed as a phase angle relative to the vowel-vowel movement cycle of the jaw (Kelso & Tuller, 1984; cf. Nittrouer et al., 1988). Across the full range of data (i.e., changes of speaking rate and stress), the latency of upper lip movement onset for the medial consonant was linearly proportional (Kelso et al., 1986a,b), or nearly so (Nittrouer et al., 1988), to the period of the jaw motion cycle associated primarily with production of the preceding vowel. Furthermore, when the data were reanalyzed using a phase angle analysis, reliable differences in phase angle were observed for the medial consonants (Kelso et al., 1986a). Thus, phase angle analysis provides a precise means of transforming a high-degrees-of-freedom articulatory database into a lower dimensional, functionally relevant form. Similar approaches are applicable across all articulator systems (Saltzman & Munhall, 1989; Löfqvist, 1990; Tuller & Kelso, 1995). Examples are the coordination of laryngeal and supralaryngeal structures such as the lips (Munhall et al., 1986; Munhall & Löfqvist, 1992) or the tongue (Manuel & Vatikiotis-Bateson, 1988), and functional coordination among various supralaryngeal articulators such as tongue-lip (Faber, 1989) and tongue-jaw coupling (Stone & Vatikiotis-Bateson, 1995). Tract Model | Gestural Modeling | State of the Art |