An unsupervised method for learning to track tongue position from an acoustic signal.

Number 1005
Year 1996
Drawer 19
Entry Date 06/30/1998
Authors Hogden, John, Rubin, Philip, and Saltzman, Elliot.
Contact Philip Rubin at Haskins Laboratories
Publication Bulletin de la communication parlée, No. 3, 101-116
url http://www.haskins.yale.edu/Reprints/HL1005.pdf
Abstract A procedure is demonstrated for learning to recover the relative positions of simulated articulators from speech signals generated by articulatory synthesis. The algorithm learns without supervision, that is, it does not require information in the training set. The procedure consists of vector quantizing short timer windows of a speech signal, then using multidimensional scaling to represent quantization codes that were temporarily close in the encoded speech signal by nearby points in a continuity map. Since temporally close sounds must have been produced by similar articulator configurations, sounds which were produced by similar articulator positions should be represented close to each other in the continuity map. Continuity maps were made from parameters (the first three formant center frequencies) derived from acoustic signal produced by an articulatory synthesizer that could vary the height and degree of fronting of the tongue body. The procedure was evaluate by comparing estimated articulator positions with those used during synthesis. High rank-order correlations (0.95 to 0.99) were found between the estimated and actual articulator positions. Reasonable estimates of relative articulator positions were made using 32 categories of sound and the accuracy improved when more sound categories were used.
Notes

Search Publications