haskins logo
Rubin and Best
Understanding Speech
Speaking and listening seem as natural as breathing to us, but that is because we have gotten so used to using language. It takes us a couple of years before we can really speak, and seven or more before we talk like adults. Because we are not aware of the steps involved, the whole process seems effortless. It is amazing, though, that children become fluent so easily. Multiple efforts to teach language to other species have resulted in very ambiguous results after years of training. Speech seems easy to us because we are biologically primed for it.

At Haskins Laboratories, we study the most basic aspects of speaking and listening. We have adults listen to manipulations of natural speech, or synthetic versions of speech. We even have them listen to the very weird sine-wave speech, which takes the sounds down to their most basic elements. These tests let us see what it is that is important in identifying the consonants and vowels of language. The way we look when we talk also influences what we hear. If you change the sound that a video clip shows, you often hear what is seen, even though you thought you were reporting what you heard. This is the McGurk effect, which is one of the most intriguing findings of recent years.

All of this may seem a bit abstract, but we are currently using the McGurk effect to look for an early diagnosis for autism. This syndrome affects more and more children every year, and its causes are not yet know. But we do know that early treatment is the best treatment. We are hoping that this new test will allow us to detect autism as early as one year of age. (Current diagnosis is only considered complete at age three.) This blending of the visual and the auditory may have seemed like a curiosity at first, but it could have large implications.

Our early work on dichotic listening, where each ear receives a different speech sound at the same time, helped determine what is going on in the brain. We are now using imaging techniques like function magnetic resonance imaging (fMRI) and magneto­encephalography (MEG) to look at signatures of brain activity more directly. We have found that the areas used for speech are linked to those used for reading. Now the exploration will help use determine what kinds of treatments are best for both perceptual problems and reading problems. It will also help map recovery of perceptual abilities in aphasia.

For our listening tests, we often use synthetic speech to manipulate the sounds in precise ways, but this kind of test also helps us improve automatic synthesis (or text-to-speech systems). The first synthesizer usable for research purposes, the Pattern Playback, was developed at Haskins in the 1950s. We have continued to explore new synthesis domains with the Articulatory Synthesis program, and sine-wave synthesis. These results improve artificial speech, and continue to feed back into other applications. Sine-wave speech, for example, has been tested as a possible way to enhance speech for the hearing-impaired.

Speaking is also a complicated act that we take for granted. Only when we unable to speak normally does it show itself to be the involved process it is. Everyone starts out being unable to speak, and the process of learning to manipulate our mouths is an intriguing topic (see, e.g., our research on babbling). Some children have delayed speech, and it is only through knowing what is typical that we can tell when there is a problem. Stuttering is another challenge that can last a life-time; Haskins researchers have a long history of comparing this baffling malady to more typical speakers. Although we have an intuitive sense of what stuttering is, its degree of difference from typical speech is better understood, guiding us in new directions for treatment.

Every speech utterance is different, even if it comes from the same person. Knowing what kinds of differences make people hear something different, and which ones we ignore, is a continuing challenge in both normal and disordered speech. Variability comes from differences in anatomy, habits, speaking style, and a host of other complications. As we refine our methods of measuring articulation (see Measuring Speech), we begin to get a better handle on this topic that means so much for knowing when things are normally unusual and when they are problematically unusual.