Speaking Faces for Face-Voice Speaker Identity Verification

Girija Chetty, Michael Wagner

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    3 Citations (Scopus)

    Abstract

    In this paper, we describe an approach for an animated speaking
    face synthesis and its application in modeling impostor/replay
    attack scenarios for face-voice based speaker verification
    systems. The speaking face reported here learns the spatiotemporal
    relationship between speech acoustics and MPEG4
    compliant facial animation points. The influence of articulatory,
    perceptual, and prosodic acoustic features along with auditory
    context on prediction accuracy was examined. The results are
    indicative of vulnerability of audiovisual identity verification
    systems to impostor/replay attacks using synthetic faces. The
    level of vulnerability depends on several factors, such as the
    type of audiovisual features, the fusion techniques used for the
    audio and video features and their relative robustness. Also, the
    success of the synthetic impostor depends on the type of coarticulation
    models and acoustic features used for the
    audiovisual mapping of speaking face synthesis.
    Original languageEnglish
    Title of host publicationProceedings of the 9th International Conference on Spoken Language Processing Interspeech 2006 - ICSLP
    EditorsCarnegie Mellon
    Place of PublicationGermany
    PublisherInternational Speech Communication Association
    Pages513-516
    Number of pages4
    ISBN (Print)9781604234497
    Publication statusPublished - 2006
    Event9th International Conference on Spoken Language Processing - Pittsburgh, United States
    Duration: 17 Sep 200621 Sep 2006

    Conference

    Conference9th International Conference on Spoken Language Processing
    CountryUnited States
    CityPittsburgh
    Period17/09/0621/09/06

    Fingerprint

    Acoustics
    Animation
    Fusion reactions

    Cite this

    Chetty, G., & Wagner, M. (2006). Speaking Faces for Face-Voice Speaker Identity Verification. In C. Mellon (Ed.), Proceedings of the 9th International Conference on Spoken Language Processing Interspeech 2006 - ICSLP (pp. 513-516). Germany: International Speech Communication Association.
    Chetty, Girija ; Wagner, Michael. / Speaking Faces for Face-Voice Speaker Identity Verification. Proceedings of the 9th International Conference on Spoken Language Processing Interspeech 2006 - ICSLP. editor / Carnegie Mellon. Germany : International Speech Communication Association, 2006. pp. 513-516
    @inproceedings{2e31108912e94eafb9c0eccbadf96dbf,
    title = "Speaking Faces for Face-Voice Speaker Identity Verification",
    abstract = "In this paper, we describe an approach for an animated speakingface synthesis and its application in modeling impostor/replayattack scenarios for face-voice based speaker verificationsystems. The speaking face reported here learns the spatiotemporalrelationship between speech acoustics and MPEG4compliant facial animation points. The influence of articulatory,perceptual, and prosodic acoustic features along with auditorycontext on prediction accuracy was examined. The results areindicative of vulnerability of audiovisual identity verificationsystems to impostor/replay attacks using synthetic faces. Thelevel of vulnerability depends on several factors, such as thetype of audiovisual features, the fusion techniques used for theaudio and video features and their relative robustness. Also, thesuccess of the synthetic impostor depends on the type of coarticulationmodels and acoustic features used for theaudiovisual mapping of speaking face synthesis.",
    author = "Girija Chetty and Michael Wagner",
    year = "2006",
    language = "English",
    isbn = "9781604234497",
    pages = "513--516",
    editor = "Carnegie Mellon",
    booktitle = "Proceedings of the 9th International Conference on Spoken Language Processing Interspeech 2006 - ICSLP",
    publisher = "International Speech Communication Association",

    }

    Chetty, G & Wagner, M 2006, Speaking Faces for Face-Voice Speaker Identity Verification. in C Mellon (ed.), Proceedings of the 9th International Conference on Spoken Language Processing Interspeech 2006 - ICSLP. International Speech Communication Association, Germany, pp. 513-516, 9th International Conference on Spoken Language Processing, Pittsburgh, United States, 17/09/06.

    Speaking Faces for Face-Voice Speaker Identity Verification. / Chetty, Girija; Wagner, Michael.

    Proceedings of the 9th International Conference on Spoken Language Processing Interspeech 2006 - ICSLP. ed. / Carnegie Mellon. Germany : International Speech Communication Association, 2006. p. 513-516.

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    TY - GEN

    T1 - Speaking Faces for Face-Voice Speaker Identity Verification

    AU - Chetty, Girija

    AU - Wagner, Michael

    PY - 2006

    Y1 - 2006

    N2 - In this paper, we describe an approach for an animated speakingface synthesis and its application in modeling impostor/replayattack scenarios for face-voice based speaker verificationsystems. The speaking face reported here learns the spatiotemporalrelationship between speech acoustics and MPEG4compliant facial animation points. The influence of articulatory,perceptual, and prosodic acoustic features along with auditorycontext on prediction accuracy was examined. The results areindicative of vulnerability of audiovisual identity verificationsystems to impostor/replay attacks using synthetic faces. Thelevel of vulnerability depends on several factors, such as thetype of audiovisual features, the fusion techniques used for theaudio and video features and their relative robustness. Also, thesuccess of the synthetic impostor depends on the type of coarticulationmodels and acoustic features used for theaudiovisual mapping of speaking face synthesis.

    AB - In this paper, we describe an approach for an animated speakingface synthesis and its application in modeling impostor/replayattack scenarios for face-voice based speaker verificationsystems. The speaking face reported here learns the spatiotemporalrelationship between speech acoustics and MPEG4compliant facial animation points. The influence of articulatory,perceptual, and prosodic acoustic features along with auditorycontext on prediction accuracy was examined. The results areindicative of vulnerability of audiovisual identity verificationsystems to impostor/replay attacks using synthetic faces. Thelevel of vulnerability depends on several factors, such as thetype of audiovisual features, the fusion techniques used for theaudio and video features and their relative robustness. Also, thesuccess of the synthetic impostor depends on the type of coarticulationmodels and acoustic features used for theaudiovisual mapping of speaking face synthesis.

    M3 - Conference contribution

    SN - 9781604234497

    SP - 513

    EP - 516

    BT - Proceedings of the 9th International Conference on Spoken Language Processing Interspeech 2006 - ICSLP

    A2 - Mellon, Carnegie

    PB - International Speech Communication Association

    CY - Germany

    ER -

    Chetty G, Wagner M. Speaking Faces for Face-Voice Speaker Identity Verification. In Mellon C, editor, Proceedings of the 9th International Conference on Spoken Language Processing Interspeech 2006 - ICSLP. Germany: International Speech Communication Association. 2006. p. 513-516