Abstract
Affect, the subjective experience of emotion, feeling, or mood, plays a fundamental role in human cognition, social interactions, and overall well-being. One of the approaches conceptualises emotions into discrete categories, such as happiness, sadness, anger, fear, disgust, surprise, and contempt, while the other approach represents emotions on continuous dimensions, namely valence (degree of pleasantness or unpleasantness) and arousal (degree of excitation or calmness). The categorical models oversimplify the complex nature of emotions, failing to capture the variations within each category. Further, emotions are often experienced on a spectrum, and categorical models may not adequately represent this range. Since the dimensional models view emotions as existing on a continuum rather than discrete categories, they provide greater flexibility in capturing the complexity of emotions.The integration of various psychological and neuroscientific perspectives, coupled with advancements in machine learning and deep learning approaches have enriched the understanding of emotions. Automatic emotion inference aims at developing computational methods employing affective data capturing facial expressions, speech signals, and physiological responses. Since emotions evolve dynamically over time, a time-continuous modelling allows to capture the fluctuations, trajectories, and transitions of emotions, providing a more accurate representation.
This thesis focuses on estimating time-continuous dimensional human affect computationally. Specifically, the aim is to infer emotions from facial images/videos by employing computer vision algorithms. However, these algorithms require massive amounts of data for training. Collecting affective data is a serious challenge, due to the subjective and dynamic nature of emotions, making it difficult to obtain consistent and reliable self-reported emotional responses. Additionally, privacy concerns, and the need to capture time-continuous emotional states further complicates the process of collecting accurate affective data. Considering these challenges, a preliminary study is performed to examine the influence of limited labelled data on affect inference. The results reveal that the learnt facial features corresponding to valence and arousal are not generalisable across subjects. Therefore, towards building a generalised affect inference model, a robust method employing multi-task contrastive learning is proposed. This framework aims to capture affect (dis)similarity, valence and arousal differentials between a pair of facial images are captured for learning effective affect representations. Further, an integration of the Action Units and facial landmarks is proposed for obtaining a focused input where affect is prominent.
While the collection of time-continuous affective data is resource-intensive, affect annotation poses a further challenge. Affect annotation is a time-consuming, costly, and copious process, as it requires skilled annotators to carefully examine each sample. Emotions can be perceived differently by individuals, making it difficult to achieve consistent annotations, resulting in a low consensus among annotators. Hence, a few-shot learning-based approach is proposed for dynamic valence and arousal labelling. Few-shot learning reduces the annotation burden, and increases adaptability to new target domains. The experimental results demonstrate that using the proposed approach, efficient labelling can be performed with few labelled samples (size < 6% of the dataset).
Additionally, performance of the proposed few-shot learning-based approach is further enhanced by incorporating a non-local neural network, which captures the temporal variations of affect in a video. A cross-dataset evaluation demonstrate that the adopted affect inference methodology is transferable and generalisable. Enhanced human-computer interaction through reduced annotation cost and time can be viewed as a broader impact of this thesis.
Date of Award | 2024 |
---|---|
Original language | English |
Supervisor | Ibrahim RADWAN (Supervisor), Roland GOECKE (Supervisor) & Ramanathan Subramanian (Supervisor) |