|
|
|
|
Pointers to research and references on speechreading can be found at
the pioneering UCSC PSL Speechreading (Lipreading) webpage.
Additional information about speechreading is available at a number of sites,
including:
|
|
NATO Advanced Study Institute: Speechreading by Man and Machine
"This [1995 NATO Advanced Study Institute was] the first forum on the interdisciplinary study of speechreading (lipreading) -- production, perception and learning by
both humans and machines. The central aim [was] to explore and promote the incorporation of visual information into acoustic speech
recognizers for improved recognition accuracy (especially in noisy environments), while drawing on and further elucidating
knowledge of the psychology of speechreading by humans."
UCSC PSL Speech Perception by Ear and Eye
Includes information about the
NSF Challenge Grant, which includes
researchers from the
Center for Spoken Language Understanding
(Oregon Graduate Institute),
Perceptual Science Laboratory
(U. C., Santa Cruz),
Interactive Systems Labs
(Carnegie Mellon University), and
Tucker Maxon Oral School.
There are also links to
Dominic Massaro's new book ,
"Perceiving Talking Faces: From Speech Perception to a Behavioral Principle",
and other links to the research of this group and others.
Hearing by Eye II.
Advances in the Psychology of Speechreading and Auditory-visual Speech.
Edited by Ruth Campbell,
Barbara Dodd, and
Denis Burnham.
"This book comprises fifteen invited chapters by leading international
researchers in psychology, psycholinguistics, experimental and clinical speech science and computer engineering. It gives intriguing
answers to theoretical questions (what are the mechanisms by which heard and seen speech combine?) and practical ones (what makes a
good speechreader? Can machines be programmed to recognize seen, and seen-and-heard speech?) The book is written in a non-technical
way and starts to articulate a behaviorally-based but cross-disciplinary program of research in understanding how natural language can be
delivered by different modalities. "
Speechreading Demo Page
(Juergen Luettin ,
Machine Vision Group at
IDIAP )
"The performance of most state-of-the-art speech recognition systems drops considerably in the presence of noise.
This limits their use in real world applications, which are basically all subject to some interference from noise.
Attempts to reduce the effect of noise in the speech signal have only shown limited success, particularly when the
noise was due to crosstalk (cocktail party effect).
Humans on the other hand use lip-reading (speechreading) as supplementary information for speech perception,
especially in noisy conditions. The main benefit of visual information stems from its complementarity to the
acoustic signal, i.e. phonemes that are difficult to distinguish acoustically are easier to distinguish visually.
We have developed a speechreading system which locates and tracks the lips of a speaker over an image sequence
to extract visualspeech information."
Visual Acoustic Speech Recognition (Computer Lipreading)
(Chris Bregler)
"In the frame of this project we investigate the utility of using visual information (lip-movements) to improve speech recogntion.
This research was initiated by the Neural Net Speech Group (UKA and CMU) where we showed significant recognition improvement
using a visual acoustic MS-TDNN architecture.
At ICSI we specifically focus on conditions where state-of-the-art recognition systems perform poorly, for example in car
environments with background noise, or office environments with cross-talk. Using a visual acoustic MLP/HMM architecture
developed at the ICSI Speech Recognition Group we showed significant improvement over pure acoustic performance.
Currently we are extending the interactive spontaneous speech system "BeRP" to make usage of the additional visual speech modality."
Interactive Systems Labs NLips
"What are we working on?
We're using neural networks for lipreading. As [our] task we use speaker dependent continuous spelling of the German alphabet.
Why are we doing lipreading?
We want to improve the recogniton rate of acoustical speech recognizers, especially in [non-]optimal conditions (cross-talking ...).
The goal is to get an on-line lipreader that is robust against all on-line conditions like illumination, translation and size without using
some additional things like ... lip-markers ... etc. "
SYS Research Audio-Visual Speech Recognition
"Speechreading (rather than just lipreading) is increasingly being investigated by researchers to enhance current speech recognisers
by adding visual information from the face of a talker. It is well known that acoustic-only speech recognisers fail in noisy conditions
while a human 'listener' is still able to understand the speech. We are able to do this by exploiting all of the available information -
for example our knowledge of language and visual cues such as body and facial gestures including lipreading. Engineers cannot
replicate human understanding of speech but we can improve acoustic speech recognition by including visual speech information."
Speechreading Bibliography
|
|