Audiovisual Speaker Identity Verification Based on Cross Modal Fusion

Girija Chetty, Michael Wagner

Research output: Contribution to conference (non-published works)Paperpeer-review

1 Citation (Scopus)
24 Downloads (Pure)


In this paper, we propose the fusion of audio and explicit correlation features for speaker identity verification applications. Experiments performed with the GMM based speaker models with hybrid fusion technique involving late fusion of explicit cross-modal fusion features, with eigen lip and audio MFCC features allow a considerable improvement in EER performance An evaluation of the system performance with different gender specific datasets from controlled VidTIMIT data base and opportunistic UCBN database shows, that is possible to achieve an EER of less than 2% with correlated component hybrid fusion, and improvement of around 22 % over uncorrelated component fusion.

Original languageEnglish
Number of pages5
Publication statusPublished - 2007
EventInternational Conference on Audio-Visual Speech Processing - Hilvarenbeek, Netherlands
Duration: 31 Aug 20073 Sept 2007


ConferenceInternational Conference on Audio-Visual Speech Processing


Dive into the research topics of 'Audiovisual Speaker Identity Verification Based on Cross Modal Fusion'. Together they form a unique fingerprint.

Cite this