Glottal waveforms for speaker inference & a regression score post-processing method applicable to general classification problems

  • David James Vandyke

    Student thesis: Doctoral Thesis


    Contributions are made along two main lines. Firstly a method is proposed for using a regression model to learn relationships within the scores of a machine learning classifier, which can then be applied to future classifier output for the purpose of improving recognition accuracy. The method is termed r-norm and strong empirical results are obtained from its application to several text-independent automatic speaker recognition tasks. Secondly the glottal waveform describing the flow of air through the glottis during voiced phonation is modelled for the task of inferring speaker identity. A prosody normalised glottal flow derivative feature termed a source-frame is proposed with empirical evidence presented for its utility in differentiating speakers. Inferences are also made from the glottal flow signal regarding detection of the mood disorder depression. Comprehensive literature reviews of the fields of automatic speaker recognition, forensic voice comparison and the estimation of the glottal waveform are also presented.
    Date of Award2014
    Original languageEnglish
    Supervisor Max Wagner (Supervisor), Girija Chetty (Supervisor) & Roland Goecke (Supervisor)

    Cite this