TY - JOUR
T1 - Joint Estimation of Human Pose and Conversational Groups from Social Scenes
AU - Varadarajan, Jagannadan
AU - Subramanian, Ramanathan
AU - Bulò, Samuel Rota
AU - Ahuja, Narendra
AU - Lanz, Oswald
AU - Ricci, Elisa
N1 - Funding Information:
This work is supported by the research grant for the Human-Centered Cyber-physical Systems Programme at the Advanced Digital Sciences Center from Singapore’s Agency for Science, Technology and Research (A*STAR). We thank NVIDIA for GPU donation.
Publisher Copyright:
© 2017, Springer Science+Business Media, LLC.
PY - 2018/4/1
Y1 - 2018/4/1
N2 - Despite many attempts in the last few years, automatic analysis of social scenes captured by wide-angle camera networks remains a very challenging task due to the low resolution of targets, background clutter and frequent and persistent occlusions. In this paper, we present a novel framework for jointly estimating (i) head, body orientations of targets and (ii) conversational groups called F-formations from social scenes. In contrast to prior works that have (a) exploited the limited range of head and body orientations to jointly learn both, or (b) employed the mutual head (but not body) pose of interactors for deducing F-formations, we propose a weakly-supervised learning algorithm for joint inference. Our algorithm employs body pose as the primary cue for F-formation estimation, and an alternating optimization strategy is proposed to iteratively refine F-formation and pose estimates. We demonstrate the increased efficacy of joint inference over the state-of-the-art via extensive experiments on three social datasets.
AB - Despite many attempts in the last few years, automatic analysis of social scenes captured by wide-angle camera networks remains a very challenging task due to the low resolution of targets, background clutter and frequent and persistent occlusions. In this paper, we present a novel framework for jointly estimating (i) head, body orientations of targets and (ii) conversational groups called F-formations from social scenes. In contrast to prior works that have (a) exploited the limited range of head and body orientations to jointly learn both, or (b) employed the mutual head (but not body) pose of interactors for deducing F-formations, we propose a weakly-supervised learning algorithm for joint inference. Our algorithm employs body pose as the primary cue for F-formation estimation, and an alternating optimization strategy is proposed to iteratively refine F-formation and pose estimates. We demonstrate the increased efficacy of joint inference over the state-of-the-art via extensive experiments on three social datasets.
KW - Conversational groups
KW - Convex optimization
KW - F-formation estimation
KW - Head and body pose estimation
KW - Semi-supervised learning
KW - Video surveillance
UR - http://www.scopus.com/inward/record.url?scp=85023748423&partnerID=8YFLogxK
U2 - 10.1007/s11263-017-1026-6
DO - 10.1007/s11263-017-1026-6
M3 - Article
AN - SCOPUS:85023748423
SN - 0920-5691
VL - 126
SP - 410
EP - 429
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
IS - 2-4
ER -