Age-group and gender classification through class-dependent phone recognition

Michael Norris, Michael Wagner

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    Abstract

    This study proposes a method to determine the gender and age group of a speaker by means of an automatic speech recognition system that is trained on six different sets of phones: one for each intersection of the two gender and three age-group classes. The study uses the Australian National Database of Spoken Language (ANDOSL) with 18 speakers in each class reading a set of 200 phonetically rich sentences. The system trains 44 context-independent phone models for each of the six classes and determines the gender and age group of an unknown utterance by finding the best matching phone sequence against the combined set of 264 phone models. Two methods of utilising the resulting phone sequences for gender and age-group recognition are evaluated: firstly, simple counting of the number of phones that belong to each class is used as the basis for the six-way class decision; secondly, the recognised phone sequence is converted to a 264-dimensional vector, whose components contain the phone counts in the phone sequence for each of the 6 x 44 phones in the combined set. An artificial neural network is trained to make the final gender and age-group decision using the count vectors as input. The artificial neural network outperforms the simple counting method with an average correct recall for gender of 97.7%, an average correct recall for age group of 60.5% and an average correct recall for combined gender and age group of 58.9%.
    Original languageEnglish
    Title of host publication13th Australasian International Conference on Speech Science and Technology
    Place of PublicationOnline
    PublisherAustralasian Speech Science and Technology Association (ASSTA)
    Pages38-41
    Number of pages4
    ISBN (Print)9780958194631
    Publication statusPublished - 2010
    Event13th Australasian International Conference on Speech Science and Technology - Melbourne, Australia
    Duration: 14 Dec 201016 Dec 2010

    Conference

    Conference13th Australasian International Conference on Speech Science and Technology
    CountryAustralia
    CityMelbourne
    Period14/12/1016/12/10

    Fingerprint

    Neural networks
    Speech recognition

    Cite this

    Norris, M., & Wagner, M. (2010). Age-group and gender classification through class-dependent phone recognition. In 13th Australasian International Conference on Speech Science and Technology (pp. 38-41). Online: Australasian Speech Science and Technology Association (ASSTA).
    Norris, Michael ; Wagner, Michael. / Age-group and gender classification through class-dependent phone recognition. 13th Australasian International Conference on Speech Science and Technology. Online : Australasian Speech Science and Technology Association (ASSTA), 2010. pp. 38-41
    @inproceedings{c8ca37abeef44e889da883ad43d20fe1,
    title = "Age-group and gender classification through class-dependent phone recognition",
    abstract = "This study proposes a method to determine the gender and age group of a speaker by means of an automatic speech recognition system that is trained on six different sets of phones: one for each intersection of the two gender and three age-group classes. The study uses the Australian National Database of Spoken Language (ANDOSL) with 18 speakers in each class reading a set of 200 phonetically rich sentences. The system trains 44 context-independent phone models for each of the six classes and determines the gender and age group of an unknown utterance by finding the best matching phone sequence against the combined set of 264 phone models. Two methods of utilising the resulting phone sequences for gender and age-group recognition are evaluated: firstly, simple counting of the number of phones that belong to each class is used as the basis for the six-way class decision; secondly, the recognised phone sequence is converted to a 264-dimensional vector, whose components contain the phone counts in the phone sequence for each of the 6 x 44 phones in the combined set. An artificial neural network is trained to make the final gender and age-group decision using the count vectors as input. The artificial neural network outperforms the simple counting method with an average correct recall for gender of 97.7{\%}, an average correct recall for age group of 60.5{\%} and an average correct recall for combined gender and age group of 58.9{\%}.",
    author = "Michael Norris and Michael Wagner",
    year = "2010",
    language = "English",
    isbn = "9780958194631",
    pages = "38--41",
    booktitle = "13th Australasian International Conference on Speech Science and Technology",
    publisher = "Australasian Speech Science and Technology Association (ASSTA)",

    }

    Norris, M & Wagner, M 2010, Age-group and gender classification through class-dependent phone recognition. in 13th Australasian International Conference on Speech Science and Technology. Australasian Speech Science and Technology Association (ASSTA), Online, pp. 38-41, 13th Australasian International Conference on Speech Science and Technology, Melbourne, Australia, 14/12/10.

    Age-group and gender classification through class-dependent phone recognition. / Norris, Michael; Wagner, Michael.

    13th Australasian International Conference on Speech Science and Technology. Online : Australasian Speech Science and Technology Association (ASSTA), 2010. p. 38-41.

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    TY - GEN

    T1 - Age-group and gender classification through class-dependent phone recognition

    AU - Norris, Michael

    AU - Wagner, Michael

    PY - 2010

    Y1 - 2010

    N2 - This study proposes a method to determine the gender and age group of a speaker by means of an automatic speech recognition system that is trained on six different sets of phones: one for each intersection of the two gender and three age-group classes. The study uses the Australian National Database of Spoken Language (ANDOSL) with 18 speakers in each class reading a set of 200 phonetically rich sentences. The system trains 44 context-independent phone models for each of the six classes and determines the gender and age group of an unknown utterance by finding the best matching phone sequence against the combined set of 264 phone models. Two methods of utilising the resulting phone sequences for gender and age-group recognition are evaluated: firstly, simple counting of the number of phones that belong to each class is used as the basis for the six-way class decision; secondly, the recognised phone sequence is converted to a 264-dimensional vector, whose components contain the phone counts in the phone sequence for each of the 6 x 44 phones in the combined set. An artificial neural network is trained to make the final gender and age-group decision using the count vectors as input. The artificial neural network outperforms the simple counting method with an average correct recall for gender of 97.7%, an average correct recall for age group of 60.5% and an average correct recall for combined gender and age group of 58.9%.

    AB - This study proposes a method to determine the gender and age group of a speaker by means of an automatic speech recognition system that is trained on six different sets of phones: one for each intersection of the two gender and three age-group classes. The study uses the Australian National Database of Spoken Language (ANDOSL) with 18 speakers in each class reading a set of 200 phonetically rich sentences. The system trains 44 context-independent phone models for each of the six classes and determines the gender and age group of an unknown utterance by finding the best matching phone sequence against the combined set of 264 phone models. Two methods of utilising the resulting phone sequences for gender and age-group recognition are evaluated: firstly, simple counting of the number of phones that belong to each class is used as the basis for the six-way class decision; secondly, the recognised phone sequence is converted to a 264-dimensional vector, whose components contain the phone counts in the phone sequence for each of the 6 x 44 phones in the combined set. An artificial neural network is trained to make the final gender and age-group decision using the count vectors as input. The artificial neural network outperforms the simple counting method with an average correct recall for gender of 97.7%, an average correct recall for age group of 60.5% and an average correct recall for combined gender and age group of 58.9%.

    M3 - Conference contribution

    SN - 9780958194631

    SP - 38

    EP - 41

    BT - 13th Australasian International Conference on Speech Science and Technology

    PB - Australasian Speech Science and Technology Association (ASSTA)

    CY - Online

    ER -

    Norris M, Wagner M. Age-group and gender classification through class-dependent phone recognition. In 13th Australasian International Conference on Speech Science and Technology. Online: Australasian Speech Science and Technology Association (ASSTA). 2010. p. 38-41