|
Q: |
Please provide a brief overview of your lip modeling work and other audiovisual research.
|
|
A: |
The 3D lip model that I'm currently developing is a parametric surface
defined by a polynomial interpolation of a set of control points. The
model is purely geometrical and the control points refer to features
common to every human lip shape : lip corners, cupidon arc, inner and
outer contour...
A nice feature of the approach is that the control points construction
allows the model to be applied flexibly -- e.g., for different speakers
and head orientation (camera view). Statistical studies for two speakers
have shown that the model can track the lips sufficiently for synthesis
using only two orthogonal parameters driving the model.
Although the original purpose of this work was to devise a 3D model for
lip tracking regardless of head orientation, recently it has also been
used for the lip animation componenent of a talking head model at ATR.
|
|
Q: |
What drives your work, theoretically?
|
|
A: |
The overall problem of lip tracking is to recover a complex shape from
noisy images as the color of the lips is difficult to separate from the
rest of the skin.
As a consequence, the more detailed you define a lip model, the better
you regularize the problem of lip tracking. This geometrical model is a
base line to add higher level considerations about the lip physiology,
kinematics, muscle control... The simplicity of the geometrical
definition allows the model to be used as a 'multi purpose tool' for lip
shapes measurement.
I try to focus on this idea that analysis and synthesis of speaking lips
are two closely related tasks.
|
|
Q: |
What is the most difficult issue, or issues, that you face when doing your research?
|
|
A: |
The biggest issue in the lip tracking work is to find the right trade-off
between strong constraints that overly reduce the generality of the
model and weak constraints that then do not prevent the model
from tracking errors.
For example, I've focused on a very low sub-space parameterization of
lip model deformation with only 2 parameters. This 2 parameters control
brings robustness to tracking and seems enough to perform realistic
animation. Nevertheless, it's far from being able to represent the
detailed motion of the lips.
One other huge problem is the control of the recording conditions where
differences of camera view and of lighting can have a huge
impact on the results.
|
|
Q: |
What are your visions for the future of this sort of research?
|
|
A: |
Compared to vocal tract studies, only a few studies have been
reported for lips in speech analysis. Most of the work done on
lip tracking comes from the computer vision community. Despite the
high quality of this work, they remain focused on image
processing techniques with no particularly detailed lip modeling.
The understanding of the motor control of the lips could bring an
important improvement both in lip tracking and realistic animation.
|
|
Q: |
Do you have any comments on related work by others that you consider to be exciting?
|
|
A: |
A 3D lip model is currently being developed by
S. Basu at the
MIT MediaLab.
His model is based on a finite element model (FEM) used to model the
elasticity of the deformation of the lip surface.
Although no intellegibility tests have been provided so far,
it seems to be an interesting approach for analysis/synthesis modelling.
|