Considerable research progress in the areas of computer vision and multimodal analysis have now made the examination of complex phenomena such as social interactions possible. An important cue toward determining social interactions is the head pose of interacting members. While most automated social interaction analysis methods have focused on round-table meetings where head pose estimation (HPE) is easier given the high resolution of captured faces and the analyzed targets are static (seated), recent works have examined unstructured meeting scenes such as cocktail parties. While unstructured meeting scenes, where targets are free to move, provide additional cues such as proxemics for behavior analysis, they are also challenging to analyze owing to (i) the need to use distant, large field-of-view cameras which can only capture low-resolution faces of targets, and (ii) the variations in targets' facial appearance as they move, owing to changing camera perspective and scale.This chapter reviews recent works addressing HPE under target motion. In particular, we examine the use of transfer learning and multitask learning for HPE. Transfer learning is particularly useful when the training and the test data have different attributes (e.g., training data contains pose annotations for static targets, but test data involves moving targets), while multitask learning can be explicitly designed to address facial appearance variations under motion. Exhaustive experiments performed using both methodologies are presented.
|Title of host publication||Group and Crowd Behavior for Computer Vision|
|Editors||Vittorio Murino, Marco Cristani, ShiShir Shah, Silvio Savarese|
|Place of Publication||United States|
|Number of pages||21|
|Publication status||Published - 1 Jan 2017|