haskins logo

Talking Heads:
Facial Animation

The pioneering work on facial animation was done by Frederic I. Parke in the 1970s. Renewed interested in this topic in the mid-1980s included the muscle model approach to facial expression of Keith Waters. Parke and Waters have written the definitive text on Computer Facial Animation.
A website (see below) provides an overview of the book.

The late 1980s also saw the first attempts at visual speech synthesis, including the work of Dominic Massaro and Michael Cohen (who are both at the UCSC Perceptual Science Laboratory), and number of other researchers (see the bibliography). Massaro has recently published a book, Perceiving Talking Faces, that summarizes the issues in visual speech synthesis and shows how computational auditory-visual models can be used to explore issues in speech perception and pattern recognition. If you are interested in seeing the wide variety of working being done in facial animation, you should start at the UCSC PSL Speech Perception by Ear and Eye / Facial Animation webpage and the UCSC PSL Facial Animation webpage.

A summary of the field and its future can be found at
SIGGRAPH 97 Panel on Facial Animation: Past, Present and Future.

This section also features an interactive version of a recent research paper entitled Kinematics-Based Synthesis of Realistic Talking Faces
by Eric Vatikiotis-Bateson, Takaaki Kuratate, Mark Tiede, and Hani Yehia, of the ATR Human Information Processing Research Laboratories.

Additional information about facial animation and modeling is available at a number of sites, including:

Ananova
The world's first virtual newsreader on the internet has been launched in London. Computer-generated Ananova is programmed to deliver news 24 hours a day.


Here are some newspaper reports.

digital CRL Facial Animation Project (Keith Waters)
"CRL's talking synthetic face is essentially the visual complement of the speech synthesizer DECtalk. Where DECtalk provides synthesised speech, CRL's Face provides a synthetic face. By combining the audio functionality of a speech synthesiser with the graphical functionality of a computer-generated face, a variety of new applications can be developed. For example, a synthetic character can give a multimedia presentation, or a synthetic character can monitor a system and report anomalies as a feedback agent. One of the more intriguing possibilities is the construction of an interactive face agent capable of assisting and conversing with the user."

Computer Facial Animation (book by Parke and Waters)

"This book is about computer facial models, computer generated facial images, and facial animation. In particular it concerns the principles of creating face models and the manipulation or control of computer generated facial attributes. In addition, various sections in the book describe and explain the development of specific computer facial animation techniques over the past twenty years, as well as those expected in the near future. "

Demetri Terzopoulos homepage (with papers & animations)

UCSC Perceptual Science Laboratory (D. Massaro & M. Cohen)

"The Perceptual Science Laboratory is engaged in a variety of experimental and theoretical inquiries in perception and cognition. A major research area concerns speech perception by ear and eye, and facial animation. We also have tested a general fuzzy logical model of perception in a variety of domains, including perception and understanding of language, memory, object, shape and depth perception, learning, and decision making. Research is also being carried out in reading."

Speech: A Sight to Behold


An online article by Barbra Rodriguez for science notes: summer 1996.
Focuses on Baldy: the UCSC PSL 3D computerized talking head model.
"When someone talks, you pick up clues about what they're saying from their facial maneuvers. Scientists are using a computerized talking image of a human head to learn about their visual language clues. Such talking heads will also allow new ways of communicating in the future."

ICP Visual Speech Synthesis

"Three-dimensional modelisation of the different organs involved in speech production: lips, jaw, tongue for the vocal tract and skin. The animation of the different parts of the face model can be done, either by video analysis of a speaker's face, or from text by means of a rule-based system."

FaceView (Alex Pentland, MediaLab, MIT)

"The FaceView project is concerned with observing, understanding, and synthesizing actions of the face and head. The current work on this project is focused on the areas of head-tracking, facial expression recognition, and non-rigid deformation of head models for animation."

MikeTalk (Tony Ezzat and Tomaso Poggio, MIT Center for Biological and Computational Learning)

"The goal of this project is to create a videorealistic text-to-audiovisual speech synthesizer. The system should take as input any typed sentence, and produce as output an audio-visual movie of a face enunciating that sentence. By videorealistic we mean that the final audiovisual output should look like it was a videocamera recording of a talking human subject."

Multimodal Speech Synthesis (KTH)

"Our approach to audio-visual speech synthesis is based on parametric descriptions of both the acoustic and visual speech modalities, in a text-to-speech framework. The visual speech synthesis uses 3D polygon models, that are parametrically articulated and deformed. Currently, we are working with two different parametric models for visual synthesis : "Holger", which is an extended version of a face model developed by F. Parke (1982), and "Olga", which was developed in the Olga-project. The auditory synthesis is based on a source-filter formant-based generation model. Parameter trajectories for both modalities are calculated by a text-to-speech rule system. In the near future, we are hoping to improve naturalness and intelligibility of the visual synthesis with the help of data obtained by optical analysis of a real speaker's articulation."

MIAMI report (Schomaker et al.)

A taxonomy of multimodal interaction in the human information processing system.

A report of the ESPRIT PROJECT 8579.

Video Rewrite: Driving Visual Speech with Audio


(Bregler, Covell & Slaney, Interval Research Corp.)

"Video Rewrite uses existing footage to create automatically new video of a person mouthing words that she did not speak in the original footage. This technique is useful in movie dubbing, for example, where the movie sequence can be modified to sync the actors' lip motions to the new soundtrack."

ATR: Speech Synchronized Human Facial Image Synthesis
(Takaaki Kuratate and Eric Vatikiotis-Bateson)
RED TED Headcase Technology
(commercial facial animation software for Windows)

"Headcase Technology provides a real time 3D graphics system which displays a realistic animated human face on your desktop. The face can gesture and automatically mouths the current sound being played by your computer."

SIonized Human Facial Image Synthesis
(Takaaki Kuratate and Eric Vatikiotis-Bateson)

RED TED Headcase Technology
(commercial facial animation software for Windows)

"Headcase Technology provides a real time 3D graphics system which displays a realistic animated human face on your desktop. The face can gesture and automatically mouths the current sound being played by your computer."

SIonized Human Facial Image Synthesis
(Takaaki Kuratate and Eric Vatikiotis-Bateson)

RED TED Headcase Technology
(commercial facial animation software for Windows)

"Headcase Technology provides a real time 3D graphics system which displays a realistic animated human face on your desktop. The face can gesture and automatically mouths the current sound being played by your computer."

SIonized Human Facial Image Synthesis
(Takaaki Kuratate and Eric Vatikiotis-Bateson)

RED