Stereo Vision Lip-Tracking for Audio-Video Speech Processing

Roland Goecke, J. Bruce Millar, Alexander Zelinsky, Jordi Robert-Ribes

Research output: A Conference proceeding or a Chapter in BookConference contribution

Abstract

We present the first results from applying a recently proposed novel algorithm for the robust and reliable automatic extraction of lip feature points to an audio-video speech data corpus.This corpus comprises 10 native speakers uttering sequences that cover the range of phonemes and visemes in Australian English. The lip-tracking algorithm is based on stereo vision which has the advantage of measurements being in real-world (3D) coordinates, instead of image (2D) coordinates. Certain lip feature points on the inner lip contour such as the lip corners and the mid-points of upper and lower lip are automatically tracked. Parameters describing the shape of the mouth are derived from these points. The results obtained so far show that there is a correlation between width and height of the mouth opening as well as between the protrusion parameters of upper and lower lips.
Original languageEnglish
Title of host publicationProceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP 2001
Subtitle of host publicationStudent Forum
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages4
ISBN (Electronic)0780370430
Publication statusPublished - 7 May 2001
Externally publishedYes
Event2001 IEEE International Conference on Acoustics, Speech, and Signal Processing - Salt Palace Convetion Center, Salt Lake City, United States
Duration: 7 May 200111 May 2001
http://www.icassp2001.org/

Conference

Conference2001 IEEE International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP 2001
CountryUnited States
CitySalt Lake City
Period7/05/0111/05/01
Internet address

Fingerprint

parameter
speech
video

Cite this

Goecke, R., Millar, J. B., Zelinsky, A., & Robert-Ribes, J. (2001). Stereo Vision Lip-Tracking for Audio-Video Speech Processing. In Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP 2001: Student Forum IEEE, Institute of Electrical and Electronics Engineers.
Goecke, Roland ; Millar, J. Bruce ; Zelinsky, Alexander ; Robert-Ribes, Jordi. / Stereo Vision Lip-Tracking for Audio-Video Speech Processing. Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP 2001: Student Forum. IEEE, Institute of Electrical and Electronics Engineers, 2001.
@inproceedings{eeecaad3fe734c49ab79c90425c828de,
title = "Stereo Vision Lip-Tracking for Audio-Video Speech Processing",
abstract = "We present the first results from applying a recently proposed novel algorithm for the robust and reliable automatic extraction of lip feature points to an audio-video speech data corpus.This corpus comprises 10 native speakers uttering sequences that cover the range of phonemes and visemes in Australian English. The lip-tracking algorithm is based on stereo vision which has the advantage of measurements being in real-world (3D) coordinates, instead of image (2D) coordinates. Certain lip feature points on the inner lip contour such as the lip corners and the mid-points of upper and lower lip are automatically tracked. Parameters describing the shape of the mouth are derived from these points. The results obtained so far show that there is a correlation between width and height of the mouth opening as well as between the protrusion parameters of upper and lower lips.",
keywords = "Audio-Video Speech Processing, Lip-Tracking, Stereo vision",
author = "Roland Goecke and Millar, {J. Bruce} and Alexander Zelinsky and Jordi Robert-Ribes",
year = "2001",
month = "5",
day = "7",
language = "English",
booktitle = "Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP 2001",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
address = "United States",

}

Goecke, R, Millar, JB, Zelinsky, A & Robert-Ribes, J 2001, Stereo Vision Lip-Tracking for Audio-Video Speech Processing. in Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP 2001: Student Forum. IEEE, Institute of Electrical and Electronics Engineers, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, United States, 7/05/01.

Stereo Vision Lip-Tracking for Audio-Video Speech Processing. / Goecke, Roland; Millar, J. Bruce; Zelinsky, Alexander; Robert-Ribes, Jordi.

Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP 2001: Student Forum. IEEE, Institute of Electrical and Electronics Engineers, 2001.

Research output: A Conference proceeding or a Chapter in BookConference contribution

TY - GEN

T1 - Stereo Vision Lip-Tracking for Audio-Video Speech Processing

AU - Goecke, Roland

AU - Millar, J. Bruce

AU - Zelinsky, Alexander

AU - Robert-Ribes, Jordi

PY - 2001/5/7

Y1 - 2001/5/7

N2 - We present the first results from applying a recently proposed novel algorithm for the robust and reliable automatic extraction of lip feature points to an audio-video speech data corpus.This corpus comprises 10 native speakers uttering sequences that cover the range of phonemes and visemes in Australian English. The lip-tracking algorithm is based on stereo vision which has the advantage of measurements being in real-world (3D) coordinates, instead of image (2D) coordinates. Certain lip feature points on the inner lip contour such as the lip corners and the mid-points of upper and lower lip are automatically tracked. Parameters describing the shape of the mouth are derived from these points. The results obtained so far show that there is a correlation between width and height of the mouth opening as well as between the protrusion parameters of upper and lower lips.

AB - We present the first results from applying a recently proposed novel algorithm for the robust and reliable automatic extraction of lip feature points to an audio-video speech data corpus.This corpus comprises 10 native speakers uttering sequences that cover the range of phonemes and visemes in Australian English. The lip-tracking algorithm is based on stereo vision which has the advantage of measurements being in real-world (3D) coordinates, instead of image (2D) coordinates. Certain lip feature points on the inner lip contour such as the lip corners and the mid-points of upper and lower lip are automatically tracked. Parameters describing the shape of the mouth are derived from these points. The results obtained so far show that there is a correlation between width and height of the mouth opening as well as between the protrusion parameters of upper and lower lips.

KW - Audio-Video Speech Processing

KW - Lip-Tracking

KW - Stereo vision

M3 - Conference contribution

BT - Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP 2001

PB - IEEE, Institute of Electrical and Electronics Engineers

ER -

Goecke R, Millar JB, Zelinsky A, Robert-Ribes J. Stereo Vision Lip-Tracking for Audio-Video Speech Processing. In Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP 2001: Student Forum. IEEE, Institute of Electrical and Electronics Engineers. 2001