This study proposes a method to determine the gender and age group of a speaker by means of an automatic speech recognition system that is trained on six different sets of phones: one for each intersection of the two gender and three age-group classes. The study uses the Australian National Database of Spoken Language (ANDOSL) with 18 speakers in each class reading a set of 200 phonetically rich sentences. The system trains 44 context-independent phone models for each of the six classes and determines the gender and age group of an unknown utterance by finding the best matching phone sequence against the combined set of 264 phone models. Two methods of utilising the resulting phone sequences for gender and age-group recognition are evaluated: firstly, simple counting of the number of phones that belong to each class is used as the basis for the six-way class decision; secondly, the recognised phone sequence is converted to a 264-dimensional vector, whose components contain the phone counts in the phone sequence for each of the 6 x 44 phones in the combined set. An artificial neural network is trained to make the final gender and age-group decision using the count vectors as input. The artificial neural network outperforms the simple counting method with an average correct recall for gender of 97.7%, an average correct recall for age group of 60.5% and an average correct recall for combined gender and age group of 58.9%.
|Title of host publication||13th Australasian International Conference on Speech Science and Technology|
|Place of Publication||Online|
|Publisher||Australasian Speech Science and Technology Association (ASSTA)|
|Number of pages||4|
|Publication status||Published - 2010|
|Event||13th Australasian International Conference on Speech Science and Technology - Melbourne, Australia|
Duration: 14 Dec 2010 → 16 Dec 2010
|Conference||13th Australasian International Conference on Speech Science and Technology|
|Period||14/12/10 → 16/12/10|
Norris, M., & Wagner, M. (2010). Age-group and gender classification through class-dependent phone recognition. In 13th Australasian International Conference on Speech Science and Technology (pp. 38-41). Online: Australasian Speech Science and Technology Association (ASSTA).