Analysis of Audio-Video Correlation in Vowels in Australian English

Roland Goecke, J. Bruce Millar, Alexander Zelinsky, Jordi Robert-Ribes

Research output: A Conference proceeding or a Chapter in BookConference contribution

Abstract

This paper investigates the statistical relationship between acoustic and visual speech features for vowels. We extract such features from our stereo vision AV speech data corpus of Australian English. A principal component analysis is performed to determine which data points of the parameter curve for each feature are the most important ones to represent the shape of each curve. This is followed by a canonical correlation analysis to determine which principal components, and hence which data points of which features, correlate most across the two modalities. Several strong correlations are reported between acoustic and visual features. In particular, F1 and F2 and mouth height were strongly correlated. Knowledge about the correlation of acoustic and visual features can be used to predict the presence of acoustic features from visual features in order to improve the recognition rate of automatic speech recognition systems in environments with acoustic noise.
Original languageEnglish
Title of host publicationProceedings of the International Conference on Auditory-Visual Speech Processing AVSP 2001
EditorsDominic W. Massaro, Joanna Light, Kristin Geraci
PublisherAuditory-Visual Speech Association
Pages115-120
Number of pages6
ISBN (Print)0971271402
Publication statusPublished - 7 Sep 2001
Externally publishedYes
EventInternational Conference on Auditory-Visual Speech Processing: 4th ESCA ETRW on Auditory-Visual Speech - Scheelsminde, Aalborg, Denmark
Duration: 7 Sep 20019 Sep 2001
https://avisa.loria.fr/avsp-archive.html

Conference

ConferenceInternational Conference on Auditory-Visual Speech Processing
Abbreviated titleAVSP 2001
CountryDenmark
CityAalborg
Period7/09/019/09/01
Internet address

Fingerprint

Acoustics
Stereo vision
Speech recognition
Acoustic noise
Principal component analysis

Cite this

Goecke, R., Millar, J. B., Zelinsky, A., & Robert-Ribes, J. (2001). Analysis of Audio-Video Correlation in Vowels in Australian English. In D. W. Massaro, J. Light, & K. Geraci (Eds.), Proceedings of the International Conference on Auditory-Visual Speech Processing AVSP 2001 (pp. 115-120). Auditory-Visual Speech Association.
Goecke, Roland ; Millar, J. Bruce ; Zelinsky, Alexander ; Robert-Ribes, Jordi. / Analysis of Audio-Video Correlation in Vowels in Australian English. Proceedings of the International Conference on Auditory-Visual Speech Processing AVSP 2001. editor / Dominic W. Massaro ; Joanna Light ; Kristin Geraci. Auditory-Visual Speech Association, 2001. pp. 115-120
@inproceedings{ffb3ec43a3c945e3b0f9c2a7920d4d2e,
title = "Analysis of Audio-Video Correlation in Vowels in Australian English",
abstract = "This paper investigates the statistical relationship between acoustic and visual speech features for vowels. We extract such features from our stereo vision AV speech data corpus of Australian English. A principal component analysis is performed to determine which data points of the parameter curve for each feature are the most important ones to represent the shape of each curve. This is followed by a canonical correlation analysis to determine which principal components, and hence which data points of which features, correlate most across the two modalities. Several strong correlations are reported between acoustic and visual features. In particular, F1 and F2 and mouth height were strongly correlated. Knowledge about the correlation of acoustic and visual features can be used to predict the presence of acoustic features from visual features in order to improve the recognition rate of automatic speech recognition systems in environments with acoustic noise.",
keywords = "Audio-Video Speech Data Corpus, Australian English, Correlation Analysis",
author = "Roland Goecke and Millar, {J. Bruce} and Alexander Zelinsky and Jordi Robert-Ribes",
year = "2001",
month = "9",
day = "7",
language = "English",
isbn = "0971271402",
pages = "115--120",
editor = "Massaro, {Dominic W.} and Joanna Light and Kristin Geraci",
booktitle = "Proceedings of the International Conference on Auditory-Visual Speech Processing AVSP 2001",
publisher = "Auditory-Visual Speech Association",

}

Goecke, R, Millar, JB, Zelinsky, A & Robert-Ribes, J 2001, Analysis of Audio-Video Correlation in Vowels in Australian English. in DW Massaro, J Light & K Geraci (eds), Proceedings of the International Conference on Auditory-Visual Speech Processing AVSP 2001. Auditory-Visual Speech Association, pp. 115-120, International Conference on Auditory-Visual Speech Processing, Aalborg, Denmark, 7/09/01.

Analysis of Audio-Video Correlation in Vowels in Australian English. / Goecke, Roland; Millar, J. Bruce; Zelinsky, Alexander; Robert-Ribes, Jordi.

Proceedings of the International Conference on Auditory-Visual Speech Processing AVSP 2001. ed. / Dominic W. Massaro; Joanna Light; Kristin Geraci. Auditory-Visual Speech Association, 2001. p. 115-120.

Research output: A Conference proceeding or a Chapter in BookConference contribution

TY - GEN

T1 - Analysis of Audio-Video Correlation in Vowels in Australian English

AU - Goecke, Roland

AU - Millar, J. Bruce

AU - Zelinsky, Alexander

AU - Robert-Ribes, Jordi

PY - 2001/9/7

Y1 - 2001/9/7

N2 - This paper investigates the statistical relationship between acoustic and visual speech features for vowels. We extract such features from our stereo vision AV speech data corpus of Australian English. A principal component analysis is performed to determine which data points of the parameter curve for each feature are the most important ones to represent the shape of each curve. This is followed by a canonical correlation analysis to determine which principal components, and hence which data points of which features, correlate most across the two modalities. Several strong correlations are reported between acoustic and visual features. In particular, F1 and F2 and mouth height were strongly correlated. Knowledge about the correlation of acoustic and visual features can be used to predict the presence of acoustic features from visual features in order to improve the recognition rate of automatic speech recognition systems in environments with acoustic noise.

AB - This paper investigates the statistical relationship between acoustic and visual speech features for vowels. We extract such features from our stereo vision AV speech data corpus of Australian English. A principal component analysis is performed to determine which data points of the parameter curve for each feature are the most important ones to represent the shape of each curve. This is followed by a canonical correlation analysis to determine which principal components, and hence which data points of which features, correlate most across the two modalities. Several strong correlations are reported between acoustic and visual features. In particular, F1 and F2 and mouth height were strongly correlated. Knowledge about the correlation of acoustic and visual features can be used to predict the presence of acoustic features from visual features in order to improve the recognition rate of automatic speech recognition systems in environments with acoustic noise.

KW - Audio-Video Speech Data Corpus

KW - Australian English

KW - Correlation Analysis

M3 - Conference contribution

SN - 0971271402

SP - 115

EP - 120

BT - Proceedings of the International Conference on Auditory-Visual Speech Processing AVSP 2001

A2 - Massaro, Dominic W.

A2 - Light, Joanna

A2 - Geraci, Kristin

PB - Auditory-Visual Speech Association

ER -

Goecke R, Millar JB, Zelinsky A, Robert-Ribes J. Analysis of Audio-Video Correlation in Vowels in Australian English. In Massaro DW, Light J, Geraci K, editors, Proceedings of the International Conference on Auditory-Visual Speech Processing AVSP 2001. Auditory-Visual Speech Association. 2001. p. 115-120