In this paper, we propose the fusion of audio and explicit lip motion features for speaker identity verification applications. Experimental results using GMM-based speaker models indicate that audiovisual fusion with explicit lip motion information provides significant performance improvement for verifying both the speaker identity and the liveness, due to tracking of the closely coupled acoustic labial dynamics. Experiments performed on different gender specific subsets of data from the VidTIMIT and UCBN databases under clean and noisy conditions show that the best performance of 7%– 11% EER is achieved for the speaker verification task and 4%–8% EER for the liveness verification scenario.
|Title of host publication||Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007)|
|Editors||Ronald Bock, Francesca Bonin, Nick Campbell, Ronald Poppe|
|Place of Publication||Germany|
|Publisher||International Speech Communication Association|
|Number of pages||4|
|Publication status||Published - 2007|
|Event||Interspeech 2007 - 8th Annual Conference of the International Speech Communication Association - Antwerp, Belgium|
Duration: 27 Aug 2007 → 31 Aug 2007
|Conference||Interspeech 2007 - 8th Annual Conference of the International Speech Communication Association|
|Period||27/08/07 → 31/08/07|
Chetty, G., & Wagner, M. (2007). Audiovisual Speaker Identity Verification Based on Lip Motion Features. In R. Bock, F. Bonin, N. Campbell, & R. Poppe (Eds.), Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007) (pp. 2045-2048). Germany: International Speech Communication Association.