R-Norm: Improving Inter-Speaker Variability Modelling at the Score Level via Regression Score Normalisation

David Vandyke, Michael WAGNER, Roland GOECKE

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    2 Citations (Scopus)

    Abstract

    This paper presents a new method of score post-processing which utilises previously hidden relationships among client models and test probes that are found within the scores produced by an automatic speaker recognition system. We suggest the name r-Norm (for Regression Normalisation) for the method, which can be viewed as both a score normalisation process and as a novel and improved modelling technique of inter-speaker variability. The key component of the method lies in learning a regression model between development data scores and an ‘ideal’ score matrix, which can either be derived from clean data or created synthetically. To generate scores for experimental validation of the proposed idea we perform a classic GMM-UBM experiment employing mel-cepstral features on the 1sp-female task of the NIST 2003 SRE corpus. Comparisons of the r-Norm results are made with standard score postprocessing/ normalisation methods t-Norm and z-Norm. The r- Norm method is shown to perform very strongly, improving the EER from 18.5% to 7.01%, significantly outperforming both z-Norm and t-Norm in this case. The baseline system performance was deemed acceptable for the aims of this experiment, which were focused on evaluating and comparing the performance of the proposed r-Norm idea.
    Original languageEnglish
    Title of host publication14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013): Speech in Life Sciences and Human Society
    EditorsFrederic Bimbot, Cecile Fougeron, Francois Pellegrino
    Place of PublicationLyon, France
    PublisherInternational Speech Communication Association
    Pages3117-3121
    Number of pages5
    Volume5
    ISBN (Print)9781629934433
    Publication statusPublished - 2013
    Event14th Annual Conference of the International Speech Communication Association Interspeech 2013 - Lyon, Lyon, France
    Duration: 25 Aug 201329 Aug 2013

    Conference

    Conference14th Annual Conference of the International Speech Communication Association Interspeech 2013
    Abbreviated titleINTERSPEECH 2013
    CountryFrance
    CityLyon
    Period25/08/1329/08/13

    Fingerprint

    Experiments
    Processing

    Cite this

    Vandyke, D., WAGNER, M., & GOECKE, R. (2013). R-Norm: Improving Inter-Speaker Variability Modelling at the Score Level via Regression Score Normalisation. In F. Bimbot, C. Fougeron, & F. Pellegrino (Eds.), 14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013): Speech in Life Sciences and Human Society (Vol. 5, pp. 3117-3121). Lyon, France: International Speech Communication Association.
    Vandyke, David ; WAGNER, Michael ; GOECKE, Roland. / R-Norm: Improving Inter-Speaker Variability Modelling at the Score Level via Regression Score Normalisation. 14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013): Speech in Life Sciences and Human Society. editor / Frederic Bimbot ; Cecile Fougeron ; Francois Pellegrino. Vol. 5 Lyon, France : International Speech Communication Association, 2013. pp. 3117-3121
    @inproceedings{7d95c86349824e18b4a5c2f45bf50ae5,
    title = "R-Norm: Improving Inter-Speaker Variability Modelling at the Score Level via Regression Score Normalisation",
    abstract = "This paper presents a new method of score post-processing which utilises previously hidden relationships among client models and test probes that are found within the scores produced by an automatic speaker recognition system. We suggest the name r-Norm (for Regression Normalisation) for the method, which can be viewed as both a score normalisation process and as a novel and improved modelling technique of inter-speaker variability. The key component of the method lies in learning a regression model between development data scores and an ‘ideal’ score matrix, which can either be derived from clean data or created synthetically. To generate scores for experimental validation of the proposed idea we perform a classic GMM-UBM experiment employing mel-cepstral features on the 1sp-female task of the NIST 2003 SRE corpus. Comparisons of the r-Norm results are made with standard score postprocessing/ normalisation methods t-Norm and z-Norm. The r- Norm method is shown to perform very strongly, improving the EER from 18.5{\%} to 7.01{\%}, significantly outperforming both z-Norm and t-Norm in this case. The baseline system performance was deemed acceptable for the aims of this experiment, which were focused on evaluating and comparing the performance of the proposed r-Norm idea.",
    keywords = "Score Post-Processing, Score Normalisation, Speaker Recognition",
    author = "David Vandyke and Michael WAGNER and Roland GOECKE",
    year = "2013",
    language = "English",
    isbn = "9781629934433",
    volume = "5",
    pages = "3117--3121",
    editor = "Frederic Bimbot and Cecile Fougeron and Francois Pellegrino",
    booktitle = "14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013): Speech in Life Sciences and Human Society",
    publisher = "International Speech Communication Association",

    }

    Vandyke, D, WAGNER, M & GOECKE, R 2013, R-Norm: Improving Inter-Speaker Variability Modelling at the Score Level via Regression Score Normalisation. in F Bimbot, C Fougeron & F Pellegrino (eds), 14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013): Speech in Life Sciences and Human Society. vol. 5, International Speech Communication Association, Lyon, France, pp. 3117-3121, 14th Annual Conference of the International Speech Communication Association Interspeech 2013, Lyon, France, 25/08/13.

    R-Norm: Improving Inter-Speaker Variability Modelling at the Score Level via Regression Score Normalisation. / Vandyke, David; WAGNER, Michael; GOECKE, Roland.

    14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013): Speech in Life Sciences and Human Society. ed. / Frederic Bimbot; Cecile Fougeron; Francois Pellegrino. Vol. 5 Lyon, France : International Speech Communication Association, 2013. p. 3117-3121.

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    TY - GEN

    T1 - R-Norm: Improving Inter-Speaker Variability Modelling at the Score Level via Regression Score Normalisation

    AU - Vandyke, David

    AU - WAGNER, Michael

    AU - GOECKE, Roland

    PY - 2013

    Y1 - 2013

    N2 - This paper presents a new method of score post-processing which utilises previously hidden relationships among client models and test probes that are found within the scores produced by an automatic speaker recognition system. We suggest the name r-Norm (for Regression Normalisation) for the method, which can be viewed as both a score normalisation process and as a novel and improved modelling technique of inter-speaker variability. The key component of the method lies in learning a regression model between development data scores and an ‘ideal’ score matrix, which can either be derived from clean data or created synthetically. To generate scores for experimental validation of the proposed idea we perform a classic GMM-UBM experiment employing mel-cepstral features on the 1sp-female task of the NIST 2003 SRE corpus. Comparisons of the r-Norm results are made with standard score postprocessing/ normalisation methods t-Norm and z-Norm. The r- Norm method is shown to perform very strongly, improving the EER from 18.5% to 7.01%, significantly outperforming both z-Norm and t-Norm in this case. The baseline system performance was deemed acceptable for the aims of this experiment, which were focused on evaluating and comparing the performance of the proposed r-Norm idea.

    AB - This paper presents a new method of score post-processing which utilises previously hidden relationships among client models and test probes that are found within the scores produced by an automatic speaker recognition system. We suggest the name r-Norm (for Regression Normalisation) for the method, which can be viewed as both a score normalisation process and as a novel and improved modelling technique of inter-speaker variability. The key component of the method lies in learning a regression model between development data scores and an ‘ideal’ score matrix, which can either be derived from clean data or created synthetically. To generate scores for experimental validation of the proposed idea we perform a classic GMM-UBM experiment employing mel-cepstral features on the 1sp-female task of the NIST 2003 SRE corpus. Comparisons of the r-Norm results are made with standard score postprocessing/ normalisation methods t-Norm and z-Norm. The r- Norm method is shown to perform very strongly, improving the EER from 18.5% to 7.01%, significantly outperforming both z-Norm and t-Norm in this case. The baseline system performance was deemed acceptable for the aims of this experiment, which were focused on evaluating and comparing the performance of the proposed r-Norm idea.

    KW - Score Post-Processing

    KW - Score Normalisation

    KW - Speaker Recognition

    M3 - Conference contribution

    SN - 9781629934433

    VL - 5

    SP - 3117

    EP - 3121

    BT - 14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013): Speech in Life Sciences and Human Society

    A2 - Bimbot, Frederic

    A2 - Fougeron, Cecile

    A2 - Pellegrino, Francois

    PB - International Speech Communication Association

    CY - Lyon, France

    ER -

    Vandyke D, WAGNER M, GOECKE R. R-Norm: Improving Inter-Speaker Variability Modelling at the Score Level via Regression Score Normalisation. In Bimbot F, Fougeron C, Pellegrino F, editors, 14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013): Speech in Life Sciences and Human Society. Vol. 5. Lyon, France: International Speech Communication Association. 2013. p. 3117-3121