A Comparative Study of 2D and 3D Lip Tracking Methods for AV ASR

Roland Goecke, Arkshay Asthana

Research output: A Conference proceeding or a Chapter in BookConference contribution

Abstract

Over the past two decades, many algorithms have been proposed to detect and track a human face and its facial features. Of particular interest to the Automatic Speech Recognition (ASR) community are algorithms that can track the shape of the lips, as such visual speech input can then be used in an auditoryvisual (AV) ASR system to improve the recognition accuracy of traditional audio-only ASR systems, particularly in the presence of acoustic noise. Despite the large number of face and lip tracking algorithms that have been proposed over the years, there is a lack of a comparative study that evaluates such algorithms in the context of AV ASR performance. In this paper, the performance of various 2D and 3D lip tracking algorithms is compared from a point of view of AV ASR. In particular, the focus of this study is on algorithms that use explicit lip models. A number of variants of the recently popular Active Appearance Models (AAMs) are compared with a 3D lip tracking algorithm that uses stereo vision. All performance evaluations are made using the AVOZES data corpus.

Original languageEnglish
Title of host publicationProceedings International Conference on Auditory-Visual Speech Processing 2008 (AVSP 2008)
Place of PublicationAdelaide
PublisherAVISA
Pages235-240
Number of pages6
Publication statusPublished - 2008
Externally publishedYes
EventAVSP 2008 - Moreton Island, Australia
Duration: 26 Sep 200829 Sep 2008

Conference

ConferenceAVSP 2008
CountryAustralia
CityMoreton Island
Period26/09/0829/09/08

Fingerprint

Speech recognition
Stereo vision
Acoustic noise

Cite this

Goecke, R., & Asthana, A. (2008). A Comparative Study of 2D and 3D Lip Tracking Methods for AV ASR. In Proceedings International Conference on Auditory-Visual Speech Processing 2008 (AVSP 2008) (pp. 235-240). Adelaide: AVISA.
Goecke, Roland ; Asthana, Arkshay. / A Comparative Study of 2D and 3D Lip Tracking Methods for AV ASR. Proceedings International Conference on Auditory-Visual Speech Processing 2008 (AVSP 2008). Adelaide : AVISA, 2008. pp. 235-240
@inproceedings{187accb891b94513afde67bdfcd3198d,
title = "A Comparative Study of 2D and 3D Lip Tracking Methods for AV ASR",
abstract = "Over the past two decades, many algorithms have been proposed to detect and track a human face and its facial features. Of particular interest to the Automatic Speech Recognition (ASR) community are algorithms that can track the shape of the lips, as such visual speech input can then be used in an auditoryvisual (AV) ASR system to improve the recognition accuracy of traditional audio-only ASR systems, particularly in the presence of acoustic noise. Despite the large number of face and lip tracking algorithms that have been proposed over the years, there is a lack of a comparative study that evaluates such algorithms in the context of AV ASR performance. In this paper, the performance of various 2D and 3D lip tracking algorithms is compared from a point of view of AV ASR. In particular, the focus of this study is on algorithms that use explicit lip models. A number of variants of the recently popular Active Appearance Models (AAMs) are compared with a 3D lip tracking algorithm that uses stereo vision. All performance evaluations are made using the AVOZES data corpus.",
author = "Roland Goecke and Arkshay Asthana",
year = "2008",
language = "English",
pages = "235--240",
booktitle = "Proceedings International Conference on Auditory-Visual Speech Processing 2008 (AVSP 2008)",
publisher = "AVISA",

}

Goecke, R & Asthana, A 2008, A Comparative Study of 2D and 3D Lip Tracking Methods for AV ASR. in Proceedings International Conference on Auditory-Visual Speech Processing 2008 (AVSP 2008). AVISA, Adelaide, pp. 235-240, AVSP 2008, Moreton Island, Australia, 26/09/08.

A Comparative Study of 2D and 3D Lip Tracking Methods for AV ASR. / Goecke, Roland; Asthana, Arkshay.

Proceedings International Conference on Auditory-Visual Speech Processing 2008 (AVSP 2008). Adelaide : AVISA, 2008. p. 235-240.

Research output: A Conference proceeding or a Chapter in BookConference contribution

TY - GEN

T1 - A Comparative Study of 2D and 3D Lip Tracking Methods for AV ASR

AU - Goecke, Roland

AU - Asthana, Arkshay

PY - 2008

Y1 - 2008

N2 - Over the past two decades, many algorithms have been proposed to detect and track a human face and its facial features. Of particular interest to the Automatic Speech Recognition (ASR) community are algorithms that can track the shape of the lips, as such visual speech input can then be used in an auditoryvisual (AV) ASR system to improve the recognition accuracy of traditional audio-only ASR systems, particularly in the presence of acoustic noise. Despite the large number of face and lip tracking algorithms that have been proposed over the years, there is a lack of a comparative study that evaluates such algorithms in the context of AV ASR performance. In this paper, the performance of various 2D and 3D lip tracking algorithms is compared from a point of view of AV ASR. In particular, the focus of this study is on algorithms that use explicit lip models. A number of variants of the recently popular Active Appearance Models (AAMs) are compared with a 3D lip tracking algorithm that uses stereo vision. All performance evaluations are made using the AVOZES data corpus.

AB - Over the past two decades, many algorithms have been proposed to detect and track a human face and its facial features. Of particular interest to the Automatic Speech Recognition (ASR) community are algorithms that can track the shape of the lips, as such visual speech input can then be used in an auditoryvisual (AV) ASR system to improve the recognition accuracy of traditional audio-only ASR systems, particularly in the presence of acoustic noise. Despite the large number of face and lip tracking algorithms that have been proposed over the years, there is a lack of a comparative study that evaluates such algorithms in the context of AV ASR performance. In this paper, the performance of various 2D and 3D lip tracking algorithms is compared from a point of view of AV ASR. In particular, the focus of this study is on algorithms that use explicit lip models. A number of variants of the recently popular Active Appearance Models (AAMs) are compared with a 3D lip tracking algorithm that uses stereo vision. All performance evaluations are made using the AVOZES data corpus.

M3 - Conference contribution

SP - 235

EP - 240

BT - Proceedings International Conference on Auditory-Visual Speech Processing 2008 (AVSP 2008)

PB - AVISA

CY - Adelaide

ER -

Goecke R, Asthana A. A Comparative Study of 2D and 3D Lip Tracking Methods for AV ASR. In Proceedings International Conference on Auditory-Visual Speech Processing 2008 (AVSP 2008). Adelaide: AVISA. 2008. p. 235-240