A Robust Speaking Face Modelling Approach Based on Multilevel Fusion

Girija Chetty, Michael Wagner

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    Abstract

    In this paper, we propose a robust face modelling approach based on multilevel fusion of 3D face biometric information with audio and visual speech information for biometric identity verification applications. The proposed approach combines the information from three audio-video based modules, namely: audio, visual speech, and 3D face and performs tri-module fusion in an automatic, unsupervised and adaptive manner, by adapting to the local performance of each module. This is done by taking the output-score based reliability estimates (confidence measures) of each of the module into account. The module weightings are determined automatically such that the reliability measure of the combined scores is maximised. To test the robustness of the proposed approach, the audio and visual speech (mouth) modalities are degraded to emulate various levels of train/test mismatch; employing additive white Gaussian noise for the audio and JPEG compression for the video signals. The results show improved fusion performance for a range of tested levels of audio and video degradation, compared to the individual module performances. Experiments on a 3D stereovision database AVOZES show that, at severe levels of audio and video mismatch, the audio, mouth, 3D face, and tri-module (audio+mouth+3D face) fusion EERs were 42.9%, 32%, 15%, and 7.3% respectively for biometric speaker identity verification application
    Original languageEnglish
    Title of host publicationProceedings Digital Image Computing Techniques and Applications - 9th Biennial Conference of the Australian Pattern Recognition Society
    EditorsM Bottema, A Maeder, N Redding, A Van Den Hengel
    Place of PublicationUnited States
    PublisherIEEE, Institute of Electrical and Electronics Engineers
    Pages408-415
    Number of pages8
    ISBN (Print)9780769530673
    DOIs
    Publication statusPublished - 2007
    EventDigital Image Computing Techniques Digital Image Computing Techniques and Applications DICTA 2007 - Glenelg, Adelaide, Australia
    Duration: 3 Dec 20075 Dec 2007

    Conference

    ConferenceDigital Image Computing Techniques Digital Image Computing Techniques and Applications DICTA 2007
    Abbreviated titleDICTA 2007
    CountryAustralia
    CityAdelaide
    Period3/12/075/12/07

    Fingerprint

    Fusion reactions
    Biometrics
    Degradation
    Experiments

    Cite this

    Chetty, G., & Wagner, M. (2007). A Robust Speaking Face Modelling Approach Based on Multilevel Fusion. In M. Bottema, A. Maeder, N. Redding, & A. V. D. Hengel (Eds.), Proceedings Digital Image Computing Techniques and Applications - 9th Biennial Conference of the Australian Pattern Recognition Society (pp. 408-415). United States: IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/DICTA.2007.4426826
    Chetty, Girija ; Wagner, Michael. / A Robust Speaking Face Modelling Approach Based on Multilevel Fusion. Proceedings Digital Image Computing Techniques and Applications - 9th Biennial Conference of the Australian Pattern Recognition Society. editor / M Bottema ; A Maeder ; N Redding ; A Van Den Hengel. United States : IEEE, Institute of Electrical and Electronics Engineers, 2007. pp. 408-415
    @inproceedings{c0dd6bb5bbc246f38a05a9a612530f2a,
    title = "A Robust Speaking Face Modelling Approach Based on Multilevel Fusion",
    abstract = "In this paper, we propose a robust face modelling approach based on multilevel fusion of 3D face biometric information with audio and visual speech information for biometric identity verification applications. The proposed approach combines the information from three audio-video based modules, namely: audio, visual speech, and 3D face and performs tri-module fusion in an automatic, unsupervised and adaptive manner, by adapting to the local performance of each module. This is done by taking the output-score based reliability estimates (confidence measures) of each of the module into account. The module weightings are determined automatically such that the reliability measure of the combined scores is maximised. To test the robustness of the proposed approach, the audio and visual speech (mouth) modalities are degraded to emulate various levels of train/test mismatch; employing additive white Gaussian noise for the audio and JPEG compression for the video signals. The results show improved fusion performance for a range of tested levels of audio and video degradation, compared to the individual module performances. Experiments on a 3D stereovision database AVOZES show that, at severe levels of audio and video mismatch, the audio, mouth, 3D face, and tri-module (audio+mouth+3D face) fusion EERs were 42.9{\%}, 32{\%}, 15{\%}, and 7.3{\%} respectively for biometric speaker identity verification application",
    author = "Girija Chetty and Michael Wagner",
    year = "2007",
    doi = "10.1109/DICTA.2007.4426826",
    language = "English",
    isbn = "9780769530673",
    pages = "408--415",
    editor = "M Bottema and A Maeder and N Redding and Hengel, {A Van Den}",
    booktitle = "Proceedings Digital Image Computing Techniques and Applications - 9th Biennial Conference of the Australian Pattern Recognition Society",
    publisher = "IEEE, Institute of Electrical and Electronics Engineers",
    address = "United States",

    }

    Chetty, G & Wagner, M 2007, A Robust Speaking Face Modelling Approach Based on Multilevel Fusion. in M Bottema, A Maeder, N Redding & AVD Hengel (eds), Proceedings Digital Image Computing Techniques and Applications - 9th Biennial Conference of the Australian Pattern Recognition Society. IEEE, Institute of Electrical and Electronics Engineers, United States, pp. 408-415, Digital Image Computing Techniques Digital Image Computing Techniques and Applications DICTA 2007, Adelaide, Australia, 3/12/07. https://doi.org/10.1109/DICTA.2007.4426826

    A Robust Speaking Face Modelling Approach Based on Multilevel Fusion. / Chetty, Girija; Wagner, Michael.

    Proceedings Digital Image Computing Techniques and Applications - 9th Biennial Conference of the Australian Pattern Recognition Society. ed. / M Bottema; A Maeder; N Redding; A Van Den Hengel. United States : IEEE, Institute of Electrical and Electronics Engineers, 2007. p. 408-415.

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    TY - GEN

    T1 - A Robust Speaking Face Modelling Approach Based on Multilevel Fusion

    AU - Chetty, Girija

    AU - Wagner, Michael

    PY - 2007

    Y1 - 2007

    N2 - In this paper, we propose a robust face modelling approach based on multilevel fusion of 3D face biometric information with audio and visual speech information for biometric identity verification applications. The proposed approach combines the information from three audio-video based modules, namely: audio, visual speech, and 3D face and performs tri-module fusion in an automatic, unsupervised and adaptive manner, by adapting to the local performance of each module. This is done by taking the output-score based reliability estimates (confidence measures) of each of the module into account. The module weightings are determined automatically such that the reliability measure of the combined scores is maximised. To test the robustness of the proposed approach, the audio and visual speech (mouth) modalities are degraded to emulate various levels of train/test mismatch; employing additive white Gaussian noise for the audio and JPEG compression for the video signals. The results show improved fusion performance for a range of tested levels of audio and video degradation, compared to the individual module performances. Experiments on a 3D stereovision database AVOZES show that, at severe levels of audio and video mismatch, the audio, mouth, 3D face, and tri-module (audio+mouth+3D face) fusion EERs were 42.9%, 32%, 15%, and 7.3% respectively for biometric speaker identity verification application

    AB - In this paper, we propose a robust face modelling approach based on multilevel fusion of 3D face biometric information with audio and visual speech information for biometric identity verification applications. The proposed approach combines the information from three audio-video based modules, namely: audio, visual speech, and 3D face and performs tri-module fusion in an automatic, unsupervised and adaptive manner, by adapting to the local performance of each module. This is done by taking the output-score based reliability estimates (confidence measures) of each of the module into account. The module weightings are determined automatically such that the reliability measure of the combined scores is maximised. To test the robustness of the proposed approach, the audio and visual speech (mouth) modalities are degraded to emulate various levels of train/test mismatch; employing additive white Gaussian noise for the audio and JPEG compression for the video signals. The results show improved fusion performance for a range of tested levels of audio and video degradation, compared to the individual module performances. Experiments on a 3D stereovision database AVOZES show that, at severe levels of audio and video mismatch, the audio, mouth, 3D face, and tri-module (audio+mouth+3D face) fusion EERs were 42.9%, 32%, 15%, and 7.3% respectively for biometric speaker identity verification application

    U2 - 10.1109/DICTA.2007.4426826

    DO - 10.1109/DICTA.2007.4426826

    M3 - Conference contribution

    SN - 9780769530673

    SP - 408

    EP - 415

    BT - Proceedings Digital Image Computing Techniques and Applications - 9th Biennial Conference of the Australian Pattern Recognition Society

    A2 - Bottema, M

    A2 - Maeder, A

    A2 - Redding, N

    A2 - Hengel, A Van Den

    PB - IEEE, Institute of Electrical and Electronics Engineers

    CY - United States

    ER -

    Chetty G, Wagner M. A Robust Speaking Face Modelling Approach Based on Multilevel Fusion. In Bottema M, Maeder A, Redding N, Hengel AVD, editors, Proceedings Digital Image Computing Techniques and Applications - 9th Biennial Conference of the Australian Pattern Recognition Society. United States: IEEE, Institute of Electrical and Electronics Engineers. 2007. p. 408-415 https://doi.org/10.1109/DICTA.2007.4426826