Automatic speaker classification based on voice characteristics

  • Phuoc Thanh Nguyen

    Student thesis: Master's Thesis

    Abstract

    Gender, age, accent and emotion are some of speaker characteristics being investigated in voice-based speaker classification systems. Classifying speaker characteristics is an important task in the fields of Dialog, Speech Synthesis, Forensics, Language Learning, Assessment, and Speaker Recognition. It is well known that reducing classification error rate has been a challenge in those research fields. This research thesis investigates new methods for speech feature extraction and classification to meet this challenge. Extracted speech features range from traditional features in speech recognition such as mel-frequency cepstral coefficients (MFCCs) to recently developed prosodic and voice quality features in speaker classification such as pitch, shimmer and jitter. Feature selection was then performed to find a more suitable feature set for building speaker models. For classification methods, feature weighting vector quantisation, Gaussian mixture models (GMMs),Support Vector Machine (SVM) and Fuzzy Support Vector Machine (FSVM) are investigated. Those new feature extraction and classification methods are then applied to gender, age, accent and emotion classification. Four well-known data sets including Australian National Database of Spoken Language (ANDOSL),aGender, EBO-DB, and FAU AIBO are used to evaluate those methods. The contributions of this thesis to classification of speaker characteristics include: 1. The use of different speech features. Up to 1582 features and transliteration have been investigated. 2. Application of new feature selection method. Correlation based feature subset selection with SFFS was employed to eliminate redundant features because of large databases. 3. The use of fuzzy SVM (FSVM) as a new speaker classification method. FSVM assigns a fuzzy membership value as a weight to each training data point to allow the decision boundary to move to overlapping regions to reduce empirical errors. 4. A detailed comparison of speaker classification performance for GMMs, SVM and FSVM. 5. A depth investigation on the relevance of feature type for classification of age and gender. Extensive experiments are performed to determine which features in the speech signal are suited to representation of age and gender in human speech. 6. Classification of age, gender, accent, and emotion characteristics is performed on four well-known data sets including ANDOSL, aGender, EBO-DB and FAU AIBO.
    Date of Award2010
    Original languageEnglish
    SupervisorDat TRAN (Supervisor) & Xu HUANG (Supervisor)

    Cite this

    '