RESEARCH
PEOPLE
PUBLICATIONS
GIVING


UNDERSTANDING SPEECH
READING
SPEECH TECHNOLOGY
Talking Heads:
Facial Animation
The late 1980s also saw the first attempts at visual speech synthesis, including the work of Dominic Massaro and Michael Cohen (who are both at the UCSC Perceptual Science Laboratory), and number of other researchers (see the bibliography). Massaro has recently published a book, Perceiving Talking Faces, that summarizes the issues in visual speech synthesis and shows how computational auditory-visual models can be used to explore issues in speech perception and pattern recognition. If you are interested in seeing the wide variety of working being done in facial animation, you should start at the UCSC PSL Speech Perception by Ear and Eye / Facial Animation webpage and the UCSC PSL Facial Animation webpage.
A summary of the field and its future can be found at
SIGGRAPH 97 Panel on Facial Animation:
Past, Present and Future.
This section also features an interactive version of a recent research paper entitled Kinematics-Based Synthesis of Realistic Talking Faces
by Eric Vatikiotis-Bateson,
Takaaki Kuratate,
Mark Tiede, and
Hani Yehia, of the
ATR Human Information Processing Research Laboratories.
Additional information about facial animation and modeling is available at a number of sites, including:
Ananova
The world's first virtual newsreader on the internet has been launched in London.
Computer-generated Ananova is programmed to deliver news 24 hours a day.
Here are some newspaper reports.
digital CRL Facial Animation Project (Keith Waters)
Computer Facial Animation (book by Parke and Waters)
"This book is about computer facial models, computer generated facial images, and facial animation. In particular
it concerns the principles of creating face models and the manipulation or control of computer generated facial
attributes. In addition, various sections in the book describe and explain the development of specific computer
facial animation techniques over the past twenty years, as well as those expected in the near future. "
Demetri Terzopoulos homepage (with papers & animations)
UCSC Perceptual Science Laboratory (D. Massaro & M. Cohen)
"The Perceptual Science Laboratory is engaged in a variety of experimental and theoretical
inquiries in perception and cognition. A major research area concerns speech perception by
ear and eye, and facial animation. We also have tested a general fuzzy logical model of
perception in a variety of domains, including perception and understanding of language,
memory, object, shape and depth perception, learning, and decision making. Research is also
being carried out in reading."
"Three-dimensional modelisation of the different organs involved in speech production: lips, jaw, tongue for the
vocal tract and skin.
The animation of the different parts of the face model can be done, either by video analysis of a speaker's face,
or from text by means of a rule-based system."
FaceView (Alex Pentland, MediaLab, MIT)
"The FaceView project is concerned with observing, understanding, and synthesizing actions of the face and
head. The current work on this project is focused on the areas of head-tracking, facial expression recognition,
and non-rigid deformation of head models for animation."
MikeTalk
(Tony Ezzat and
Tomaso Poggio,
MIT Center for Biological and Computational Learning) "The goal of this project is to create a videorealistic text-to-audiovisual speech synthesizer. The system should take as input any typed
sentence, and produce as output an audio-visual movie of a face enunciating that sentence. By videorealistic we mean that the final
audiovisual output should look like it was a videocamera recording of a talking human subject."
Multimodal Speech Synthesis (KTH) "Our approach to audio-visual speech synthesis
is based on parametric descriptions of both the acoustic and visual speech
modalities, in a text-to-speech framework. The visual speech synthesis
uses 3D polygon models, that are parametrically articulated and deformed.
Currently, we are working with two different parametric models for visual
synthesis : "Holger", which is an extended version of a face
model developed by F. Parke (1982), and "Olga", which was developed in
the Olga-project. The auditory synthesis is based on a source-filter formant-based
generation model. Parameter trajectories for both modalities are calculated
by a text-to-speech rule system. In the near future, we are hoping to improve
naturalness and intelligibility of the visual synthesis with the help of
data obtained by optical analysis of a real speaker's articulation."
MIAMI report (Schomaker et al.) A taxonomy of multimodal interaction in the human information processing system.
A report of the ESPRIT PROJECT 8579.
Video Rewrite: Driving Visual Speech with Audio
"Video Rewrite uses existing footage to create
automatically new video of a person mouthing
words that she did not speak in the original
footage. This technique is useful in movie
dubbing, for example, where the movie
sequence can be modified to sync the actors'
lip motions to the new soundtrack."
ATR: Speech Synchronized Human Facial Image Synthesis
"Headcase Technology provides a real time 3D graphics system which displays a
realistic animated human face on your desktop. The face can gesture and
automatically mouths the current sound being played by your computer."
SIonized Human Facial Image Synthesis
RED TED Headcase Technology
"Headcase Technology provides a real time 3D graphics system which displays a
realistic animated human face on your desktop. The face can gesture and
automatically mouths the current sound being played by your computer."
SIonized Human Facial Image Synthesis
RED TED Headcase Technology
"Headcase Technology provides a real time 3D graphics system which displays a
realistic animated human face on your desktop. The face can gesture and
automatically mouths the current sound being played by your computer."
SIonized Human Facial Image Synthesis
"CRL's talking synthetic face is essentially the visual complement of the speech synthesizer DECtalk.
Where DECtalk provides synthesised speech, CRL's Face provides a synthetic face. By combining
the audio functionality of a speech synthesiser with the graphical functionality of a
computer-generated face, a variety of new applications can be developed. For example, a synthetic
character can give a multimedia presentation, or a synthetic character can monitor a system and report
anomalies as a feedback agent. One of the more intriguing possibilities is the construction of an
interactive face agent capable of assisting and conversing with the user."
An online article by Barbra Rodriguez for science notes: summer 1996.
Focuses on Baldy: the UCSC PSL 3D computerized talking head model.
"When someone talks, you pick up clues about what they're saying from
their facial maneuvers. Scientists are using a computerized talking image
of a human head to learn about their visual language clues. Such talking
heads will also allow new ways of communicating in the future."
(Bregler, Covell & Slaney, Interval Research Corp.)
(Takaaki Kuratate and Eric Vatikiotis-Bateson)
RED TED Headcase Technology
(commercial facial animation software for Windows)
(Takaaki Kuratate and Eric Vatikiotis-Bateson)
(commercial facial animation software for Windows)
(Takaaki Kuratate and Eric Vatikiotis-Bateson)
(commercial facial animation software for Windows)
(Takaaki Kuratate and Eric Vatikiotis-Bateson)

