Emotion Classification in Children's speech using fusion of acoustic and linguistic features

Tim Polzehl, Shiva Sundaram, Hamed Ketabdar, Michael Wagner, Florian Metze

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    27 Citations (Scopus)

    Abstract

    This paper describes a system to detect angry vs. non-angry utterances of children who are engaged in dialog with an Aibo robot dog. The system was submitted to the Interspeech2009 Emotion Challenge evaluation. The speech data consist of short utterances of the children�s speech, and the proposed system is designed to detect anger in each given chunk. Frame-based cepstral features, prosodic and acoustic features as well as glottal excitation features are extracted automatically, reduced in dimensionality and classified by means of an artificial neural network and a support vector machine. An automatic speech recognizer transcribes the words in an utterance and yields a separate classification based on the degree of emotional salience of the words. Late fusion is applied to make a final decision on anger vs. non-anger of the utterance. Preliminary results show 75.9% unweighted average recall on the training data and 67.6% on the test set.
    Original languageEnglish
    Title of host publication10th Annual Conference of the International Speech Communication Association (Interspeech 2009)
    EditorsM Uther
    Place of PublicationBrighton, UK
    PublisherInternational Speech Communication Association
    Pages340-343
    Number of pages4
    ISBN (Print)9781615676927
    Publication statusPublished - 2009
    EventInterspeech-2009 - Brighton, United Kingdom
    Duration: 6 Sep 20099 Sep 2009

    Conference

    ConferenceInterspeech-2009
    CountryUnited Kingdom
    CityBrighton
    Period6/09/099/09/09

    Fingerprint

    Linguistics
    Fusion reactions
    Acoustics
    Support vector machines
    Robots
    Neural networks

    Cite this

    Polzehl, T., Sundaram, S., Ketabdar, H., Wagner, M., & Metze, F. (2009). Emotion Classification in Children's speech using fusion of acoustic and linguistic features. In M. Uther (Ed.), 10th Annual Conference of the International Speech Communication Association (Interspeech 2009) (pp. 340-343). Brighton, UK: International Speech Communication Association.
    Polzehl, Tim ; Sundaram, Shiva ; Ketabdar, Hamed ; Wagner, Michael ; Metze, Florian. / Emotion Classification in Children's speech using fusion of acoustic and linguistic features. 10th Annual Conference of the International Speech Communication Association (Interspeech 2009). editor / M Uther. Brighton, UK : International Speech Communication Association, 2009. pp. 340-343
    @inproceedings{f9a498e0915544a5a48c4c69f2b50f17,
    title = "Emotion Classification in Children's speech using fusion of acoustic and linguistic features",
    abstract = "This paper describes a system to detect angry vs. non-angry utterances of children who are engaged in dialog with an Aibo robot dog. The system was submitted to the Interspeech2009 Emotion Challenge evaluation. The speech data consist of short utterances of the children�s speech, and the proposed system is designed to detect anger in each given chunk. Frame-based cepstral features, prosodic and acoustic features as well as glottal excitation features are extracted automatically, reduced in dimensionality and classified by means of an artificial neural network and a support vector machine. An automatic speech recognizer transcribes the words in an utterance and yields a separate classification based on the degree of emotional salience of the words. Late fusion is applied to make a final decision on anger vs. non-anger of the utterance. Preliminary results show 75.9{\%} unweighted average recall on the training data and 67.6{\%} on the test set.",
    author = "Tim Polzehl and Shiva Sundaram and Hamed Ketabdar and Michael Wagner and Florian Metze",
    year = "2009",
    language = "English",
    isbn = "9781615676927",
    pages = "340--343",
    editor = "M Uther",
    booktitle = "10th Annual Conference of the International Speech Communication Association (Interspeech 2009)",
    publisher = "International Speech Communication Association",

    }

    Polzehl, T, Sundaram, S, Ketabdar, H, Wagner, M & Metze, F 2009, Emotion Classification in Children's speech using fusion of acoustic and linguistic features. in M Uther (ed.), 10th Annual Conference of the International Speech Communication Association (Interspeech 2009). International Speech Communication Association, Brighton, UK, pp. 340-343, Interspeech-2009, Brighton, United Kingdom, 6/09/09.

    Emotion Classification in Children's speech using fusion of acoustic and linguistic features. / Polzehl, Tim; Sundaram, Shiva; Ketabdar, Hamed; Wagner, Michael; Metze, Florian.

    10th Annual Conference of the International Speech Communication Association (Interspeech 2009). ed. / M Uther. Brighton, UK : International Speech Communication Association, 2009. p. 340-343.

    Research output: A Conference proceeding or a Chapter in BookConference contribution

    TY - GEN

    T1 - Emotion Classification in Children's speech using fusion of acoustic and linguistic features

    AU - Polzehl, Tim

    AU - Sundaram, Shiva

    AU - Ketabdar, Hamed

    AU - Wagner, Michael

    AU - Metze, Florian

    PY - 2009

    Y1 - 2009

    N2 - This paper describes a system to detect angry vs. non-angry utterances of children who are engaged in dialog with an Aibo robot dog. The system was submitted to the Interspeech2009 Emotion Challenge evaluation. The speech data consist of short utterances of the children�s speech, and the proposed system is designed to detect anger in each given chunk. Frame-based cepstral features, prosodic and acoustic features as well as glottal excitation features are extracted automatically, reduced in dimensionality and classified by means of an artificial neural network and a support vector machine. An automatic speech recognizer transcribes the words in an utterance and yields a separate classification based on the degree of emotional salience of the words. Late fusion is applied to make a final decision on anger vs. non-anger of the utterance. Preliminary results show 75.9% unweighted average recall on the training data and 67.6% on the test set.

    AB - This paper describes a system to detect angry vs. non-angry utterances of children who are engaged in dialog with an Aibo robot dog. The system was submitted to the Interspeech2009 Emotion Challenge evaluation. The speech data consist of short utterances of the children�s speech, and the proposed system is designed to detect anger in each given chunk. Frame-based cepstral features, prosodic and acoustic features as well as glottal excitation features are extracted automatically, reduced in dimensionality and classified by means of an artificial neural network and a support vector machine. An automatic speech recognizer transcribes the words in an utterance and yields a separate classification based on the degree of emotional salience of the words. Late fusion is applied to make a final decision on anger vs. non-anger of the utterance. Preliminary results show 75.9% unweighted average recall on the training data and 67.6% on the test set.

    M3 - Conference contribution

    SN - 9781615676927

    SP - 340

    EP - 343

    BT - 10th Annual Conference of the International Speech Communication Association (Interspeech 2009)

    A2 - Uther, M

    PB - International Speech Communication Association

    CY - Brighton, UK

    ER -

    Polzehl T, Sundaram S, Ketabdar H, Wagner M, Metze F. Emotion Classification in Children's speech using fusion of acoustic and linguistic features. In Uther M, editor, 10th Annual Conference of the International Speech Communication Association (Interspeech 2009). Brighton, UK: International Speech Communication Association. 2009. p. 340-343