Audio-Visual Multilevel Fusion for Speech and Speaker Recognition

Girija Chetty, Michael Wagner

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    Abstract

    In this paper we propose a robust audio-visual speech-andspeaker recognition system with liveness checks based on audio-visual fusion of audio-lip motion and depth features. The liveness verification feature added here guards the system against advanced spoofing attempts such as manufactured or replayed videos. For visual features, a new tensor-based representation of lip motion features, extracted from an intensity and depth subspace of 3D video sequences, is fused used with the audio features. A multilevel fusion paradigm involving first a Support Vector Machine for speech (digit) recognition and then a Gaussian Mixture Model for speaker verification with liveness checks allowed a significant performance improvement over single-mode features. Experimental evaluation for different scenarios with AVOZES, a 3D stereovision speaking-face database, shows favourable results with recognition accuracies of 70-90% for the digit recognition task, and EERs of 5% and 3% for the speaker verification and liveness check tasks respectively.
    Original languageEnglish
    Title of host publicationProceedings of Interspeech 2008 Conference Incorporating SST 2008
    EditorsJanet Fletcher, DeborahLoakes Roland Goecke, Denis Burnham, Michael Wagner
    Place of Publication Australia
    PublisherInternational Speech Communication Association
    Pages379-382
    Number of pages4
    ISBN (Print)9781615673780
    Publication statusPublished - 2008
    EventInterspeech 2008 - Brisbane, Australia
    Duration: 22 Sep 200826 Sep 2008

    Conference

    ConferenceInterspeech 2008
    CountryAustralia
    CityBrisbane
    Period22/09/0826/09/08

      Fingerprint

    Cite this

    Chetty, G., & Wagner, M. (2008). Audio-Visual Multilevel Fusion for Speech and Speaker Recognition. In J. Fletcher, D. R. Goecke, D. Burnham, & M. Wagner (Eds.), Proceedings of Interspeech 2008 Conference Incorporating SST 2008 (pp. 379-382). Australia: International Speech Communication Association.