Stereo Vision Lip-Tracking for Audio-Video Speech Processing

Roland Goecke, J. Bruce Millar, Alexander Zelinsky, Jordi Robert-Ribes

Research output: A Conference proceeding or a Chapter in BookConference contribution

33 Downloads (Pure)

Abstract

We present the first results from applying a recently proposed novel algorithm for the robust and reliable automatic extraction of lip feature points to an audio-video speech data corpus.This corpus comprises 10 native speakers uttering sequences that cover the range of phonemes and visemes in Australian English. The lip-tracking algorithm is based on stereo vision which has the advantage of measurements being in real-world (3D) coordinates, instead of image (2D) coordinates. Certain lip feature points on the inner lip contour such as the lip corners and the mid-points of upper and lower lip are automatically tracked. Parameters describing the shape of the mouth are derived from these points. The results obtained so far show that there is a correlation between width and height of the mouth opening as well as between the protrusion parameters of upper and lower lips.
Original languageEnglish
Title of host publicationProceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP 2001
Subtitle of host publicationStudent Forum
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages1-4
Number of pages4
ISBN (Electronic)0780370430
Publication statusPublished - 7 May 2001
Externally publishedYes
Event2001 IEEE International Conference on Acoustics, Speech, and Signal Processing - Salt Palace Convetion Center, Salt Lake City, United States
Duration: 7 May 200111 May 2001
http://www.icassp2001.org/

Conference

Conference2001 IEEE International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP 2001
Country/TerritoryUnited States
CitySalt Lake City
Period7/05/0111/05/01
Internet address

Fingerprint

Dive into the research topics of 'Stereo Vision Lip-Tracking for Audio-Video Speech Processing'. Together they form a unique fingerprint.

Cite this