The automated study of free interactions in unstructured social gatherings (e.g., cocktail party) has attracted much attention from the computer science community, and is also of critical importance to other fields such as psychology and sociology. In this context, the study of free-standing conversational groups (FCGs) has recently become one of the flagship tasks, and many associated research questions remain unanswered at this time. Despite the wealth of information available at the group (social networks) and individual (native behavior and personality traits) levels, the difficulty in examining cues such as target locations, their speaking activity, and head/body pose has hampered research in this respect. In a bid to address the above shortcomings, we propose SALSA, a dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 min, under the poster presentation and cocktail party contexts. Many challenges arise due to the low resolution of bodies and faces, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems, we record the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, Bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning the individuals' personality, position, head, body orientation, and F-formation membership. Through extensive experiments with state-of-the-art approaches, we (a) highlight the limitations of existing methods and (b) show how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.
|Title of host publication||Group and Crowd Behavior for Computer Vision|
|Number of pages||20|
|Publication status||Published - 1 Jan 2017|