Abstract
In this paper, we propose the fusion of audio and explicit correlation features for speaker identity verification applications. Experiments performed with the GMM based speaker models with hybrid fusion technique involving late fusion of explicit cross-modal fusion features, with eigen lip and audio MFCC features allow a considerable improvement in EER performance An evaluation of the system performance with different gender specific datasets from controlled VidTIMIT data base and opportunistic UCBN database shows, that is possible to achieve an EER of less than 2% with correlated component hybrid fusion, and improvement of around 22 % over uncorrelated component fusion.
Original language | English |
---|---|
Pages | 1-5 |
Number of pages | 5 |
Publication status | Published - 2007 |
Event | International Conference on Audio-Visual Speech Processing - Hilvarenbeek, Netherlands Duration: 31 Aug 2007 → 3 Sept 2007 |
Conference
Conference | International Conference on Audio-Visual Speech Processing |
---|---|
Country/Territory | Netherlands |
City | Hilvarenbeek |
Period | 31/08/07 → 3/09/07 |