CONTENTS
BEING BIOLOGICAL
SIMULACRA
SPEECH SYNTHESIS
VOCAL TRACTS
ARTICULATORS
SPEECH PRODUCTION
MCGURK
SPEECHREADING
FACIAL ANIMATION
AVATARS
BACKGROUND
DIRECTORY
BIBLIOGRAPHY

Talking Heads:
Bibliography

     Articulatory synthesis | Avatars | Facial animation | McGurk
Speech - general | Speech synthesis | Speech production | Speechreading


 
Speechreading:

Benoît, C. (1995) On the production and perception of audio-visual speech by man and machine. In Y. Wang, et al., (Eds.),Multimedia & Video Coding, Plenum Press, NY.

Benoît, C., Guiard-Marigny, T., Le Goff, B., & Adjoudani, A. (1996). Which components of the face do humans and machines best speechread?. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 315-328.

Bernstein, L. E. & Auer, E. T., Jr. (1996). Word Recognition in Speechreading. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 17-26.

Bernstein, L. E. & Demorest, M. E. (1993). A general theory of speech perception must account for speech perception without audition (lipreading/speechreading). 34th Annual Meeting of the Psychonomic Society, 645.

Bernstein, L. E. & Demorest, M. E. (1993). Speech perception without audition. Journal of the Acoustical Society of America, 94, 1887.

Bernstein, L. E., Demorest, M. E., & Tucker, P. E. (1998). What makes a good speechreader? First you have to find one. In R. Campbell, B. Dodd & D. Burnham (Eds.) (1998). Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory-visual Speech. Psychology Press Ltd., East Sussex, UK, 211-227.

Bernstein, L .E. & Eberhardt, S.P. (1986). Johns Hopkins Lipreading Corpus Videodisk Set. The Johns Hopkins University, Baltimore, MD.

Bernstein, L. E., Eberhardt, S. P., & Demorest, M. E. (1989). Single-channel vibrotactile supplements to visual perception of intonation and stress. Journal of the Acoustical Society of America, 85, 397-405.

Boothroyd, A., Hnath-Chisolm, T., Hanin, L., &Kishon-Rabin, L. (1988). Voice fundamental frequency as an auditory supplement to the speechreading of sentences. Ear and Hearing, 9, 306-312.

Braida L. D. (1991). Crossmodal integration in the identification of consonant segments. Quarterly Journal of Experimental Psychology. a, Human Experimental Psychology, 43(3), 647-77.

Breeuwer, M. & Plomp, R. (1984). Speechreading supplemented with frequency-selective sound-pressure information. Journal of the Acoustical Society of America, 76, 686-691.

Breeuwer, M. & Plomp, R. (1986). Speechreading supplemented with auditorily presented speech parameters. Journal of the Acoustical Society of America, 79, 481-499.

Bregler, C. & Konig, Y. (1994). "Eigenlips" for Robust Speech Recognition in, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Adelaide, Australia.

Bregler, C., Manke, S., Hild, H., & Waibel, A. (1993). Bimodal Sensor Integration on the Example of "Speachreading", Proc. of IEEE Int. Conf. on Neural Networks, San Francisco.

Bregler, C. & Omohundro, S. (1994). Surface Learning with Applications to Lipreading, in Cowan, J.D., Tesauro, G., and Alspector, J. (eds.), Advances in Neural Information Processing Systems 6. Morgan Kaufmann Publishers, San Francisco, CA.

Bregler, C., Omohundro, S. M., Shi, J., & Konig, Y. (1996). Towards a Robust Speechreading Dialog System. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 409-423.

Brooke, N. M. (1989). Visible speech signals: Investigating their analysis, synthesis, and perception. In M. M. Taylor, F. Neel, & D. G. Bouwhuis (Eds.), The Structure of Multimodal Dialogue. Elsevier Science Publishers, Holland.

Brooke, N.M. (1992). Mouth shapes and speech. In V. Bruce & M. Burton (Eds.) Processing Images of Faces. Ablex Publishing Corporation, Norwood, NJ, 20-40.

Brooke, N. M. (1996). Talking heads and speech recognisers that can see: The computer processing of visual speech signals. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 351-371.

Brooke, N. M. (1998). Computational aspects of visual speech: machines that can speechread and simulate talking faces. In R. Campbell, B. Dodd & D. Burnham (Eds.) (1998). Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory-visual Speech. Psychology Press Ltd., East Sussex, UK, 109-122.

Brooke, N.M. & Petajan, E.D. (1986). Seeing Speech : Investigations into the Synthesis and Recognition of Visible Speech Movements Using Automatic Image Processing and Computer Graphics, Proceedings of the International Conference on Speech Input and Output : Techniques and Applications, 24-26.

Brooke, N. M. & Summerfield, A. Q. (1983). Analysis, synthesis, and perception of visible articulatory movements. Journal of Phonetics, 11, 63-76.

Brooke, N. M. & Templeton, P.D. (1990). Classification of lip shapes and their association with acoustic speech events. Proceedings of the ECSA Workshop on Speech Synthesis (Autrans, France), 245-248.

Brooke, N. M. & Templeton, P.D. (1990). Visual speech intelligibility of digitally processed facial images. Proceedings of the Institute of Acoustics (Autumn Conference, Windmeme), 12(1), 483-490.

Campbell, R. (1986). The lateralisation of lipread sounds: A first look. Brain and Cognition, 5, 1-21.

Campbell, R. (1988). Tracing lip movements: Making speech visible. Visible Language, 22, 32-57.

Campbell, R. (1989). Lipreading. In A. W. Young & H. D. Ellis (Eds.), Handbook of Research on Face Processing, Elsevier, North-Holland, Amsterdam.

Campbell, R. (1992). The neuropsychology of lipreading. Philosophical Transcriptions of the Royal Society of London, 335, 39-45.

Campbell, R. (1994). Audiovisual speech: Where, what, when, how? Current Psychology of Cognition, 13, 76-80.

Campbell, R. (1996). Seeing Brains Reading Speech: A Review and Speculations. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 115-133.

Campbell, R. (1998). How brains see speech: The cortical localisation of speechreading in hearing people. In R. Campbell, B. Dodd & D. Burnham (Eds.) (1998). Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory-visual Speech. Psychology Press Ltd., East Sussex, UK, 177-193.

Campbell, R., Dodd, B., & Burnham, D. (Eds.) (1998). Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory-visual Speech. Psychology Press Ltd., East Sussex, UK.

Cohen, M. M., Walker, R. L., & Massaro, D. W. (1996). Perception of Synthetic Visual Speech. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 153-168.

Coianiz, T., Torresani, L., & Caprile, B. (1996). 2D Deformable Models for Visual Speech Analysis. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 391-398.

Dalton, B., Kaucic, R., & Balke, A. (1996). Automatic Speechreading using dynamic contours. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 373-382.

de Gelder, B., Vroomen, J., and Bachoud-Levi, A.-C. (1998). Impaired speechreading and audio-visual speech integration in prosopagnosia. In R. Campbell, B. Dodd & D. Burnham (Eds.) (1998). Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory-visual Speech. Psychology Press Ltd., East Sussex, UK, 195-207.

Demorest, M. E., & Bernstein, L. E. (1991). Computational explorations of speechreading. Journal of the Academy of Rehabilitative Audiology, 24, 97-111.

Dodd, B. (1979). Lipreading in infants: Attention to speech presented in and out of synchrony. Cognitive Psychology, 11, 478-484.

Dodd, B. & Burnham, D. (1988). Processing speechread information. Volta Review: New Reflections on Speechreading, 90, 45-60.

Dodd, B. & Campbell, R. (Eds.) (1987). Hearing by Eye: The Psychology of Lip-Reading. Lawrence Erlbaum Associates, Hillsdale, NJ.

Driver, J. (1996). Enhancement of selected listening by illusory mislocation of speech sounds due to lip-reading. Nature, 381, 66-68.

Erber, N. P. (1969). Interaction of audition and vision in the recognition of oral speech stimuli. Journal of Speech and Hearing Research, 12, 423-425.

Erber, N. P. (1972). Auditory, visual and auditory-visual recognition of consonants by children with normal and impaired hearing. Journal of Speech and Hearing Research, 15, 413-422.

Erber, N. P. & De Filippo, C. L. (1978). Voice-mouth synthesis of tactual/visual perception of /pa,ba,ma/. Journal of the Acoustical Society of America, 64, 1015-1019.

Erber, N. P. (1992). Effects of a question-answer format on visual perception of sentences. Journal of the Academy of Rehabilitative Audiology, 25, 113-122.

Ewertsen, H. W. & Nielsen, H. B. (1971). A comparative analysis of the audiovisual, auditive and visual perception of speech. Acta Otolaryngologica, 71, 201-205.

Finn, E.K. & Montgomery A.A. (1988). Automatic optically based recognition of speech. Pattern Recognition Letters, 8, 3, 159 - 164.

Gagne, J.-P., Dinon, D., & Parsons, J. (1991). An evaluation of CAST: A Compuer-Aided Speechreading Training program. Journal of Speech and Hearing Research, 34, 213-221.

Garcia, O., Goldschen, A. J., & Petajan, E. D. (1992). Feature extraction for optical automatic speech recognition or automatic lipreading. George Washington University: IIST-92-32, November.

Goldschen, A. J. (1993). Continuous Automatic Speech Recognition by Lipreading. Ph.D. Dissertation, George Washington University, Washington, D.C., September 1993.

Greenwald, A. B. (1984). Lipreading Made Easy. Alexander Graham Bell Association for the Deaf, Washington, DC.

Hennecke, M.E., Stork, D. G., & Prasad, K. V. (1996). Visionary Speech: Looking Ahead to Practical Speechreading Systems. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 331-349.

Hiki, S. & Fukuda, Y. (1996). Multiphasic Analysis of the Basic Nature of Speechreading. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 239-246.

LeGoff, B., Guiard-Marigny, T., & Benoît, C. (1996). Analysis-synthesis and intelligibility of a talking face. In Progress in speech synthesis, J. P. H. v. Santen, R. W. Sproat, J. P. Olive, and J. Hirschberg, Eds., New York: Springer-Verlag, 235-246.

Luettin, J., Thacker, N. A., & Beet, S. W. (1996). Active Shape Models for Visual Speech Feature Extraction. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 383-390.

Mase, K. & Pentland, A. (1983). Automatic lipreading by optical-flow analysis. Systems and Computers in Japan, 22:6.

Massaro, D. W. (1987). Speech perception by ear and eye: A paradigm for psychological inquiry. Lawrence Erlbaum Associates, Hillsdale, NJ.

Massaro, D. W. (1996). Bimodal Speech Perception: A Progress Report. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 79-101.

Massaro, D. W., & Cohen, M. M. (1983). Evaluation and integration of visual and auditory information in speech perception. Journal of Experimental Psychology: Human Perception and Performance, 9, 753-771.

Massaro, D. W., & Cohen, M. M. (1990). Perception of synthesized audible and visible speech. Psychological Science, 1, 55-63.

Massaro, D. W., Cohen, M., & Gesi, A. (1993). Long-term training, transfer, and retention in learning to lipread. Perception and Psychophysics, 53, 549-562.

Massaro, D. W., Tsuzaki, M., Cohen, M., Gesi, A., & Heridia, R. (1993). Bimodal speech perception: An examination across languages. Journal of Phonetics, 21, 445-478.

Massaro, D. W., Cohen, M., & Thompson, L. A. (1988). Visible language in speech perception: Lipreading and reading. Visible Language, 22, 9-31.

McGrath, M., Summerfield, A.Q., & Brooke, N.M. (1984) Roles of lips and teeth in lipreading vowels. Proceedings of the Institute of Acoustics (Autumn Meeting, Windermere), 6(4), 401-408.

Montgomery, A. A. (1980). Development of a model for generating synthetic animated lip shapes. Journal of the Acoustical Society of America, 68, S58 (abstract)

Montgomery, A. & Jackson, P. (1983). Physical characteristics of the lips underlying vowel lipreading performance. Journal of the Acoustical Society of America, 73(6), 2134-2144.

Munhall, K. G. & Vatikiotis-Bateson. (1998). The moving face during speech communication. In R. Campbell, B. Dodd & D. Burnham (Eds.) (1998). Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory-visual Speech. Psychology Press Ltd., East Sussex, UK, 123-139.

Petajan, E. D. (1985). Automatic lipreading to enhance speech recognition. IEEE Computer society conference on computer vision and pattern recognition. June 19-23. 40-47.

Petajan, E. D., Bischoff, B., Bodoff, D., & Brooke, N. M. (1988). An improved automatic lipreading system to enhance speech recognition. In E. Soloway, D. Frye & S. B. Sheppard (Eds.), CHI '88 Conference Proceedings: Human Factors in Computing Systems (Washington, D.C.), 19-25 (Association for Computing Machinery, New York).

Petajan, E. D., Brooke, N. M., Bischoff, B., & Bodoff, D. A. (1988). Experiments in automatic visual speech recognition. In W.A. Ainsworth & J.N. Holmes (Eds.) Proceedings of the 7th Symposium of the Federation of Acoustical Societies of Europe (FASE), 1163-1170 (Institute of Acoustics, Edinburgh).

Petajan, E. & Graf, H. P. (1996). Robust Face Feature Analysis for Automatic Speechreading and Character Animation. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 425-436.

Prasad, K. V., Stork, D. G., & Wolff, G. (1993). Preprocessing video images for neural learning of lipreading. Ricoh California Research Center, Technical Report CRC-TR-93-26.

Reed, C. M., Durlach, N. I., Braida, L. D., & Schultz, M. C. (1989). Analytic study of the Tadoma method: effects of hand position on segmental speech perception. Journal of Speech and Hearing Research, 32(4), 921-9.

Reed, C. M., Rabinowitz, W .M., Durlach N.,I., Delhorne, L. A., Braida, L .D., Pemberton J.C., Mulcahey, B. D., & Washington, D. L. (1992). Analytic study of the Tadoma method: improving performance through the use of supplementary tactual displays. Journal of Speech and Hearing Research, 35(2), 450-65.

Robert-Ribes, J., Piquemal, M., Schwartz, J.-L., & Escudier, P. (1996). Exploiting sensor fusion architectures and stimuli complementarity in AV speech recognition. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 193-210.

Rönnberg, J., Arlinger, S., Byxell, B., & Kinnefords, C. (1989). Visual evoked potentials: Relation to adult speechreading and cognitive function. Journal of Speech and Hearing Research, 32, 725-735.

Rosenblum, L.D., Johnson, J. A., & Saldaña, H.M. (1996). Visual kinematic information for embellishing speech in noise. Journal of Speech and Hearing Research, 39(6), 1159-1170.

Rosenblum, L.D. & Saldaña, H.M. (1996). An audiovisual test of kinematic primitives for visual speech perception. Journal of Experimental Psychology: Human Perception and Performance, 22(2), 318-331.

Rosenblum, L.D. & Saldaña, H.M. (1998). Time-varying information for visual speech perception. In R. Campbell, B. Dodd & D. Burnham (Eds.) (1998). Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory-visual Speech. Psychology Press Ltd., East Sussex, UK, 61-81.

Sams, M., Aulanko, R., Hämäläinen, M., Hari, R., Lounasmaa, O. V., Lu, S.-T., & Simola, J. (1991). Seeing speech: visual information from lip movements modifies activity in the human auditory cortex. Neuroscience Letters, 127, 141-145.

Sams, M.& Levänen, S. (1996). Where and when are the heard and seen speech integrated: Magnetoencephalographical (MEG) studies. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 233-238.

Schwartz, J.-L., Robert-Ribes, J., & Escudier, P. (1998). Ten years after Summerfield: a taxonomy of models for audio-visual fusion in speech perception. In R. Campbell, B. Dodd & D. Burnham (Eds.) (1998). Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory-visual Speech. Psychology Press Ltd., East Sussex, UK, 85-108.

Shepherd, D. C. (1982). Visual-neural correlate of speechreading ability in normal-hearing adults. Journal of Speech and Hearing Research, 25, 521-527.

Smeele, P. M. T. (1996). Psychology of Human Speechreading. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 3-15.

Stork, D. G. (Ed.) (1997). HAL's Legacy: 2001's Computer as Dream and Reality.. MIT Press, Cambridge, MA.

Stork, D. G. (1997). "I could see your lips move": HAL and Speechreading. In D. G. Stork (Ed.), (1997), HAL's Legacy: 2001's Computer as Dream and Reality.. MIT Press, Cambridge, MA, 237-261.

Stork, D. G. & Hennecke, M. E. (Eds.) (1996). Speechreading by Humans and Machines: Models, Systems and Applications. Springer-Verlag, New York.

Stork, D. G., Wolff, G., & Levine, E. (1992). Neural network lipreading system for improved speech recognition. Proceedings of the 1992 International Joint Conference on Neural Networks, Baltimore, MD.

Sumby, W. H. & Pollack, I. (1954). Visual contributions to speech intelligibility in noise. Journal of the Acoustical Society of America, 26, 212-215.

Summerfield, A. Q. (1979). Use of visual information in phonetic perception. Phonetica, 36, 314-331.

Summerfield, A. Q. (1983). Audio-visual speech perception, lipreading and artificial stimulation. In M. E. Lutman & M. P. Haggard (Eds.), Hearing Science and Hearing Disorders. Academic, London.

Summerfield, A.Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In B. Dodd & R. Campbell (Eds.) (1987), Hearing by Eye: The Psychology of Lip-Reading, Lawrence Erlbaum Associates, Hillsdale, NJ.

Summerfield, A.Q. (1991). Visual perception of phonetic gestures. In I. G. Mattingly & M. Studdert-Kennedy (Eds.) (1991), Modularity and the Motor Theory of Speech Perception, Lawrence Erlbaum Associates, Hillsdale, NJ.

Summerfield, A.Q. (1992) Lipreading and audio-visual speech perception. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 335(1273), 71-78.

Summerfield, A. Q., MacLoed, A., McGrath, M., & Brooke, N.M. (1989) Lips, teeth, and the benefits of lipreading. In A.W. Young & H.D. Ellis (Eds.) Handbook of Research in Face Processing, North Holland, Amsterdam.

Utley, J. (1946). A test of lipreading ability. Journal of Speech and Hearing Disorders, 11, 109-116.

Vatikiotis-Bateson, E., Eigsti, I.-M., Yano, S., & Munhall, K. (in press). Eye movement of perceivers during audiovisual speech perception. Perception & Psychophysics.

Vatikiotis-Bateson, E., Munhall, K. G., Hirayama, M., Lee, Y. C., & Terzopoulos, D. (1996). The dynamics of audiovisual behavior in speech. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, 221-232.

Vatikiotis-Bateson, E., Munhall, K. G., Kasahara, Y., Garcia, F., & Yehia, H. (1996). Characterizing audiovisual information during speech. In Proceedings ICSLP 96, 1485-1488. Philadelphia, Penn.

Vroomen, J. H. M. (1992). Hearing voices and seeing lips: Investigations in the psychology of lipreading. PhD thesis, Katholike Universiteit Brabant, Tilburg, Netherlands.

Walden, B. E., Busacco, D. A., & Montgomery, A. A. (1993). Benefit from visual cues in auditory-visual speech recognition by middle-aged and elderly persons. Journal of Speech and Hearing Research, 36(2), 431-6.

Walden, B. E., Erdman, S. A, Montgomery, A. A., Schwartz, D. M., & Prosek, R. A. (1981). Some effects of training on speech recognition by hearing-impaired adults. Journal of Speech and Hearing Research, 24, 207-216.

Walden, B. E., Montgomery, A. A., Prosek, R. A., & Hawkins, D. B. (1990). Visual biasing of normal and impaired auditory speech perception. Journal of Speech and Hearing Research, 33(1), 163-73.

Walden, B., Prosek, R., Montgomery, A., Scherr, C. K., & Jones, C. J. (1977). Effects

Waldstein, R. S. & Boothroyd, A. (1995). Speechreading supplemented by single-channel and multichannel tactile displays of voice fundamental frequency. Journal of Speech and Hearing Research, 38, 690-705.

Walther, E. F. (1982). Lipreading. Nelson-Hall Inc., Chicago.

Yehia, H. C., Rubin, P. E., & Vatikiotis-Bateson, E. (In press). Quantitative association of acoustic, facial, and vocal-tract shapes, Speech Communication.

Yuhas, B. P., Goldstein, M. H., & Sejnowski, T. J. (1989). Integration of acoustic and visual speech signals using neural networks. IEEE Communications Magazine, 27, 65 - 71.

Yuhas, B. P., Goldstein, M. H., Sejnowski, T. J,. & Jenkins, R. E. (1990). Neural network models of sensory integration for improved vowel recognition. Proceedings of the IEEE, 78(10) 1658-1668.

Yuhas, B. P., Sejnowski, T. J., Goldstein, M. H., & Jenkins, R. E. (1990). Combining visual and acoustic speech signals with a neural network improves intelligibility. In D. Touretzky (Ed.) Advances in Neural Information Processing Systems, 2. Morgan-Kaufmann, San Mateo, CA, 232-239.

(See, also, the McGurk and Facial animation bibliographies.)

 

Bibliography  
   Articulatory synthesis | Avatars | bibliographies.)

 

Bibliography  
   Articulatory synthesis | Avatars | bibliographies.)

 

Bibliography