haskins logo
A-154 Collaborative research: Landmark-based robust speech recognition using prosody-guided models of speech variability
Louis Goldstein, NSF

Research Goals. This is a subcontract for a large speech recognition project to be performed at several sites, whose aim is to develop an ASR (automatic speech recognition) model that, for the first time, will integrate realistic models of syntax, prosody, lexical structure, and speech production and perception into a graphical model (GM) framework and apply this knowledge to large vocabulary databases. More specifically, the project will develop large-vocabulary models of speech acoustics, pronunciation variability, prosody, and syntax by (1) deriving knowledge representations based on studies of human speech production and perception, (2) training component probabilities using a wide variety of corpora with transcriptions most appropriate to each component of the recognizer, and (3) integrating and testing using the formalism of a dynamic Bayesian network (DBN).

The work to  be performed at Haskins focuses on development of the pronunciation model, which will incorporate an articulatory phonology lexicon, the coupled oscillator model of speech production planning (which will make predictions about speech variability), and the task dynamic model of articulatory coordination. We will also be working extensively with the X-ray microbeam database, generating lexical coupling graphs, gestural scores, and prosodic transcriptions for all the utterances in the database, so that these can be used as part of the overall system training.

Current Status. This project was funded for three years beginning 6/1/07. Total costs for the first year are $298,000.