Audiovisual Speaker Identity Verification Based on Lip Motion Features

Girija Chetty, Michael Wagner

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    Abstract

    In this paper, we propose the fusion of audio and explicit lip motion features for speaker identity verification applications. Experimental results using GMM-based speaker models indicate that audiovisual fusion with explicit lip motion information provides significant performance improvement for verifying both the speaker identity and the liveness, due to tracking of the closely coupled acoustic labial dynamics. Experiments performed on different gender specific subsets of data from the VidTIMIT and UCBN databases under clean and noisy conditions show that the best performance of 7%– 11% EER is achieved for the speaker verification task and 4%–8% EER for the liveness verification scenario.
    Original languageEnglish
    Title of host publicationProceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007)
    EditorsRonald Bock, Francesca Bonin, Nick Campbell, Ronald Poppe
    Place of PublicationGermany
    PublisherInternational Speech Communication Association
    Pages2045-2048
    Number of pages4
    ISBN (Print)9781605603162
    Publication statusPublished - 2007
    EventInterspeech 2007 - 8th Annual Conference of the International Speech Communication Association - Antwerp, Belgium
    Duration: 27 Aug 200731 Aug 2007

    Conference

    ConferenceInterspeech 2007 - 8th Annual Conference of the International Speech Communication Association
    CountryBelgium
    CityAntwerp
    Period27/08/0731/08/07

    Fingerprint

    Fusion reactions
    Acoustics
    Experiments

    Cite this

    Chetty, G., & Wagner, M. (2007). Audiovisual Speaker Identity Verification Based on Lip Motion Features. In R. Bock, F. Bonin, N. Campbell, & R. Poppe (Eds.), Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007) (pp. 2045-2048). Germany: International Speech Communication Association.
    Chetty, Girija ; Wagner, Michael. / Audiovisual Speaker Identity Verification Based on Lip Motion Features. Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007). editor / Ronald Bock ; Francesca Bonin ; Nick Campbell ; Ronald Poppe. Germany : International Speech Communication Association, 2007. pp. 2045-2048
    @inproceedings{ae2ca21143ba4a3798095e80dda0125e,
    title = "Audiovisual Speaker Identity Verification Based on Lip Motion Features",
    abstract = "In this paper, we propose the fusion of audio and explicit lip motion features for speaker identity verification applications. Experimental results using GMM-based speaker models indicate that audiovisual fusion with explicit lip motion information provides significant performance improvement for verifying both the speaker identity and the liveness, due to tracking of the closely coupled acoustic labial dynamics. Experiments performed on different gender specific subsets of data from the VidTIMIT and UCBN databases under clean and noisy conditions show that the best performance of 7{\%}– 11{\%} EER is achieved for the speaker verification task and 4{\%}–8{\%} EER for the liveness verification scenario.",
    author = "Girija Chetty and Michael Wagner",
    year = "2007",
    language = "English",
    isbn = "9781605603162",
    pages = "2045--2048",
    editor = "Ronald Bock and Francesca Bonin and Nick Campbell and Ronald Poppe",
    booktitle = "Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007)",
    publisher = "International Speech Communication Association",

    }

    Chetty, G & Wagner, M 2007, Audiovisual Speaker Identity Verification Based on Lip Motion Features. in R Bock, F Bonin, N Campbell & R Poppe (eds), Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007). International Speech Communication Association, Germany, pp. 2045-2048, Interspeech 2007 - 8th Annual Conference of the International Speech Communication Association, Antwerp, Belgium, 27/08/07.

    Audiovisual Speaker Identity Verification Based on Lip Motion Features. / Chetty, Girija; Wagner, Michael.

    Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007). ed. / Ronald Bock; Francesca Bonin; Nick Campbell; Ronald Poppe. Germany : International Speech Communication Association, 2007. p. 2045-2048.

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    TY - GEN

    T1 - Audiovisual Speaker Identity Verification Based on Lip Motion Features

    AU - Chetty, Girija

    AU - Wagner, Michael

    PY - 2007

    Y1 - 2007

    N2 - In this paper, we propose the fusion of audio and explicit lip motion features for speaker identity verification applications. Experimental results using GMM-based speaker models indicate that audiovisual fusion with explicit lip motion information provides significant performance improvement for verifying both the speaker identity and the liveness, due to tracking of the closely coupled acoustic labial dynamics. Experiments performed on different gender specific subsets of data from the VidTIMIT and UCBN databases under clean and noisy conditions show that the best performance of 7%– 11% EER is achieved for the speaker verification task and 4%–8% EER for the liveness verification scenario.

    AB - In this paper, we propose the fusion of audio and explicit lip motion features for speaker identity verification applications. Experimental results using GMM-based speaker models indicate that audiovisual fusion with explicit lip motion information provides significant performance improvement for verifying both the speaker identity and the liveness, due to tracking of the closely coupled acoustic labial dynamics. Experiments performed on different gender specific subsets of data from the VidTIMIT and UCBN databases under clean and noisy conditions show that the best performance of 7%– 11% EER is achieved for the speaker verification task and 4%–8% EER for the liveness verification scenario.

    M3 - Conference contribution

    SN - 9781605603162

    SP - 2045

    EP - 2048

    BT - Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007)

    A2 - Bock, Ronald

    A2 - Bonin, Francesca

    A2 - Campbell, Nick

    A2 - Poppe, Ronald

    PB - International Speech Communication Association

    CY - Germany

    ER -

    Chetty G, Wagner M. Audiovisual Speaker Identity Verification Based on Lip Motion Features. In Bock R, Bonin F, Campbell N, Poppe R, editors, Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007). Germany: International Speech Communication Association. 2007. p. 2045-2048