Multi-modal Framework for Analyzing the Affect of a Group of People

Xiaohua Huang, Abhinav Dhall, Roland Goecke, Matti Pietikainen, Guoying Zhao

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

With the advances in multimedia and the world wide web, users upload millions of images and videos everyone on social networking platforms on the Internet. From the perspective of automatic human behavior understanding, it is of interest to analyze and model the affects that are exhibited by groups of people who are participating in social events in these images. However, the analysis of the affect that is expressed by multiple people is challenging due to the varied indoor and outdoor settings. Recently, a few interesting works have investigated face-based Group-level Emotion Recognition (GER). In this paper, we propose a multi-modal framework for enhancing the affective analysis ability of GER in challenging environments. Specifically, for encoding a person's information in a group-level image, we first propose an information aggregation method for generating feature descriptions of face, upper body and scene. Later, we revisit localized multiple kernel learning for fusing face, upper body and scene information for GER against challenging environments. Intensive experiments are performed on two challenging group-level emotion databases (HAPPEI and GAFF) to investigate the roles of the face, upper body, scene information and the multi-modal framework. Experimental results demonstrate that the multi-modal framework achieves promising performance for GER.

Original languageEnglish
Pages (from-to)2706-2721
Number of pages16
JournalIEEE Transactions on Multimedia
Volume20
Issue number10
Early online date23 Mar 2018
DOIs
Publication statusPublished - Oct 2018

Fingerprint

World Wide Web
Agglomeration
Internet
Experiments

Cite this

Huang, Xiaohua ; Dhall, Abhinav ; Goecke, Roland ; Pietikainen, Matti ; Zhao, Guoying. / Multi-modal Framework for Analyzing the Affect of a Group of People. In: IEEE Transactions on Multimedia. 2018 ; Vol. 20, No. 10. pp. 2706-2721.
@article{4ee6786a4f50420a950b7b26af39f67e,
title = "Multi-modal Framework for Analyzing the Affect of a Group of People",
abstract = "With the advances in multimedia and the world wide web, users upload millions of images and videos everyone on social networking platforms on the Internet. From the perspective of automatic human behavior understanding, it is of interest to analyze and model the affects that are exhibited by groups of people who are participating in social events in these images. However, the analysis of the affect that is expressed by multiple people is challenging due to the varied indoor and outdoor settings. Recently, a few interesting works have investigated face-based Group-level Emotion Recognition (GER). In this paper, we propose a multi-modal framework for enhancing the affective analysis ability of GER in challenging environments. Specifically, for encoding a person's information in a group-level image, we first propose an information aggregation method for generating feature descriptions of face, upper body and scene. Later, we revisit localized multiple kernel learning for fusing face, upper body and scene information for GER against challenging environments. Intensive experiments are performed on two challenging group-level emotion databases (HAPPEI and GAFF) to investigate the roles of the face, upper body, scene information and the multi-modal framework. Experimental results demonstrate that the multi-modal framework achieves promising performance for GER.",
keywords = "Emotion recognition, Face, Facial expression recognition, Facial features, Group-level emotion recognition, Multi-modality, Group affect",
author = "Xiaohua Huang and Abhinav Dhall and Roland Goecke and Matti Pietikainen and Guoying Zhao",
year = "2018",
month = "10",
doi = "10.1109/TMM.2018.2818015",
language = "English",
volume = "20",
pages = "2706--2721",
journal = "IEEE Transactions on Multimedia",
issn = "1520-9210",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
number = "10",

}

Multi-modal Framework for Analyzing the Affect of a Group of People. / Huang, Xiaohua; Dhall, Abhinav; Goecke, Roland; Pietikainen, Matti; Zhao, Guoying.

In: IEEE Transactions on Multimedia, Vol. 20, No. 10, 10.2018, p. 2706-2721.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Multi-modal Framework for Analyzing the Affect of a Group of People

AU - Huang, Xiaohua

AU - Dhall, Abhinav

AU - Goecke, Roland

AU - Pietikainen, Matti

AU - Zhao, Guoying

PY - 2018/10

Y1 - 2018/10

N2 - With the advances in multimedia and the world wide web, users upload millions of images and videos everyone on social networking platforms on the Internet. From the perspective of automatic human behavior understanding, it is of interest to analyze and model the affects that are exhibited by groups of people who are participating in social events in these images. However, the analysis of the affect that is expressed by multiple people is challenging due to the varied indoor and outdoor settings. Recently, a few interesting works have investigated face-based Group-level Emotion Recognition (GER). In this paper, we propose a multi-modal framework for enhancing the affective analysis ability of GER in challenging environments. Specifically, for encoding a person's information in a group-level image, we first propose an information aggregation method for generating feature descriptions of face, upper body and scene. Later, we revisit localized multiple kernel learning for fusing face, upper body and scene information for GER against challenging environments. Intensive experiments are performed on two challenging group-level emotion databases (HAPPEI and GAFF) to investigate the roles of the face, upper body, scene information and the multi-modal framework. Experimental results demonstrate that the multi-modal framework achieves promising performance for GER.

AB - With the advances in multimedia and the world wide web, users upload millions of images and videos everyone on social networking platforms on the Internet. From the perspective of automatic human behavior understanding, it is of interest to analyze and model the affects that are exhibited by groups of people who are participating in social events in these images. However, the analysis of the affect that is expressed by multiple people is challenging due to the varied indoor and outdoor settings. Recently, a few interesting works have investigated face-based Group-level Emotion Recognition (GER). In this paper, we propose a multi-modal framework for enhancing the affective analysis ability of GER in challenging environments. Specifically, for encoding a person's information in a group-level image, we first propose an information aggregation method for generating feature descriptions of face, upper body and scene. Later, we revisit localized multiple kernel learning for fusing face, upper body and scene information for GER against challenging environments. Intensive experiments are performed on two challenging group-level emotion databases (HAPPEI and GAFF) to investigate the roles of the face, upper body, scene information and the multi-modal framework. Experimental results demonstrate that the multi-modal framework achieves promising performance for GER.

KW - Emotion recognition

KW - Face

KW - Facial expression recognition

KW - Facial features

KW - Group-level emotion recognition

KW - Multi-modality

KW - Group affect

UR - http://www.scopus.com/inward/record.url?scp=85044348664&partnerID=8YFLogxK

UR - https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8323249

U2 - 10.1109/TMM.2018.2818015

DO - 10.1109/TMM.2018.2818015

M3 - Article

VL - 20

SP - 2706

EP - 2721

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

SN - 1520-9210

IS - 10

ER -