Abstract
The human computer interaction will be more natural if
computers are able to perceive and respond to human nonverbal communication such as emotions. Although several
approaches have been proposed to recognize human emotions
based on facial expressions or speech, relatively limited work
has been done to fuse these two, improve the accuracy and
robustness of the emotion recognition system. This paper
analyzes the strengths and the limitations of systems based
only on facial expressions or acoustic information. It also
analyses two approaches used to fuse these two modalities:
decision level and feature level integration, and proposes a
new multilevel fusion approach for enhancing the person
dependant and person independent classification performance
for different emotions. Two different audiovisual emotion
data corpora was used for the evaluating the proposed fusion
approach - DaFEx[1,2] and ENTERFACE[3] comprising
audiovisual emotion data from several actors eliciting five
different emotions – anger, disgust, fear, happiness, sadness
and surprise. The results of the experimental study reveal that
the system based on fusion of facial expression with acoustic
information yields better performance than the system based
on just acoustic information or facial expressions, for the
emotions considered. Results also show an improvement in
classification performance of different emotions with a
multilevel fusion approach as compared to either feature level
or score-level fusion.
computers are able to perceive and respond to human nonverbal communication such as emotions. Although several
approaches have been proposed to recognize human emotions
based on facial expressions or speech, relatively limited work
has been done to fuse these two, improve the accuracy and
robustness of the emotion recognition system. This paper
analyzes the strengths and the limitations of systems based
only on facial expressions or acoustic information. It also
analyses two approaches used to fuse these two modalities:
decision level and feature level integration, and proposes a
new multilevel fusion approach for enhancing the person
dependant and person independent classification performance
for different emotions. Two different audiovisual emotion
data corpora was used for the evaluating the proposed fusion
approach - DaFEx[1,2] and ENTERFACE[3] comprising
audiovisual emotion data from several actors eliciting five
different emotions – anger, disgust, fear, happiness, sadness
and surprise. The results of the experimental study reveal that
the system based on fusion of facial expression with acoustic
information yields better performance than the system based
on just acoustic information or facial expressions, for the
emotions considered. Results also show an improvement in
classification performance of different emotions with a
multilevel fusion approach as compared to either feature level
or score-level fusion.
Original language | English |
---|---|
Title of host publication | Proceedings of Audiovisual Speech Processing 2008 |
Editors | Amit Konar, Aruna Chakraborty |
Place of Publication | Adelaide |
Publisher | AVISA |
Pages | 115-120 |
Number of pages | 6 |
Volume | 2008 |
Publication status | Published - 2008 |
Event | Audiovisual Speech Processing 2008 - Moreton Island, Australia Duration: 26 Sept 2008 → 29 Sept 2008 |
Conference
Conference | Audiovisual Speech Processing 2008 |
---|---|
Country/Territory | Australia |
City | Moreton Island |
Period | 26/09/08 → 29/09/08 |