In this paper, we propose the fusion of audio and explicit correlation features for speaker identity verification applications. Experiments performed with the GMM based speaker models with hybrid fusion technique involving late fusion of explicit cross-modal fusion features, with eigen lip and audio MFCC features allow a considerable improvement in EER performance An evaluation of the system performance with different gender specific datasets from controlled VidTIMIT data base and opportunistic UCBN database shows, that is possible to achieve an EER of less than 2% with correlated component hybrid fusion, and improvement of around 22 % over uncorrelated component fusion.
|Number of pages
|Published - 2007
|International Conference on Audio-Visual Speech Processing - Hilvarenbeek, Netherlands
Duration: 31 Aug 2007 → 3 Sept 2007
|International Conference on Audio-Visual Speech Processing
|31/08/07 → 3/09/07