This paper investigates the statistical relationship between acoustic and visual speech features for vowels. We extract such features from our stereo vision AV speech data corpus of Australian English. A principal component analysis is performed to determine which data points of the parameter curve for each feature are the most important ones to represent the shape of each curve. This is followed by a canonical correlation analysis to determine which principal components, and hence which data points of which features, correlate most across the two modalities. Several strong correlations are reported between acoustic and visual features. In particular, F1 and F2 and mouth height were strongly correlated. Knowledge about the correlation of acoustic and visual features can be used to predict the presence of acoustic features from visual features in order to improve the recognition rate of automatic speech recognition systems in environments with acoustic noise.
|Title of host publication||Proceedings of the International Conference on Auditory-Visual Speech Processing AVSP 2001|
|Editors||Dominic W. Massaro, Joanna Light, Kristin Geraci|
|Publisher||Auditory-Visual Speech Association|
|Number of pages||6|
|Publication status||Published - 7 Sep 2001|
|Event||International Conference on Auditory-Visual Speech Processing: 4th ESCA ETRW on Auditory-Visual Speech - Scheelsminde, Aalborg, Denmark|
Duration: 7 Sep 2001 → 9 Sep 2001
|Conference||International Conference on Auditory-Visual Speech Processing|
|Abbreviated title||AVSP 2001|
|Period||7/09/01 → 9/09/01|
Goecke, R., Millar, J. B., Zelinsky, A., & Robert-Ribes, J. (2001). Analysis of Audio-Video Correlation in Vowels in Australian English. In D. W. Massaro, J. Light, & K. Geraci (Eds.), Proceedings of the International Conference on Auditory-Visual Speech Processing AVSP 2001 (pp. 115-120). Auditory-Visual Speech Association.