Abstract
Micro-expressions are brief, involuntary facial movements that reveal genuine emotions, often occurring when an individual either subconsciously reacts to an emotion or consciously attempts to suppress it. The detection and recognition of micro-expressions offer a transparent lens to discern concealed sentiments. This PhD project explores Micro-Expression Recognition (MER) and Detection (MED) by employing Spatio-Temporal Modelling and meticulously designed hand-crafted methods for emotional understanding, then forms an essential foundation for analysing mental health issues. Micro-expression recognition and detection demonstrate both conceptual and practical differences. MER focuses on identifying distinct facial movements for each class, whereas MED aims to distinguish micro-expression clips from a long video that contains different time scales and motion intensity movements.This research primarily poses the following questions, and by addressing questions and validating hypotheses, it delves deeper into micro-expression studies. Key questions include: How do the challenges of studying micro-expressions contrast with other automated facial analysis tasks? With the data scarcity in micro-expressions, can intricate deep learning architectures still prove effective? Is there a convergence point where deep learning and hand-crafted methods intersect optimally? Moreover, how do recognition and detection tasks differ conceptually and practically, and how does this disparity influence their respective efficacies? This research conducted a series of experiments to delve into the intricacies of micro-expression recognition and detection. The proposed frameworks were tested on datasets such as SMIC, CASMEII, and SAMM for recognition, CAS(ME)2 and the SAMM long video dataset for detection.
Initially, a pre-trained spatial model was utilised to shape an end-to-end framework for micro-expression recognition, which was subsequently refined by incorporating video magnification methods and optical flow. The results indicate that conventional deep learning algorithms, when adapted with strategies such as transfer learning and fine-tuning, can produce superior results compare to purely handcrafted methods. In the second phase, an end-to-end spatio-temporal model was introduced to investigate temporal networks’ micro-expression feature extraction capability, such as LSTM and GRU. This phase also evaluated these networks’ dynamic information learning capacity compared to hand-crafted techniques such as optical flow. The findings suggest that deep learning holds significant promise but cannot fully replace hand-crafted methods with the approaches and data we currently have. However, when carefully designed, a blend of both can lead to an optimal approach. The final experiment expanded upon the results from previous studies to detect expression clips with varying time scales and motion intensities (both macro and micro-expressions) within lengthy and compound videos. The proposed method amalgamates algorithms that demonstrated superior performance in earlier experiments, integrating optical flow, facial landmarks, and the end-to-end framework. This approach achieved results that outperform many state-of-the-art techniques.
From extensive and rigorous experiments, conclusions for the posed questions emerged. Even though micro-expression studies present unique challenges due to their low intensity, short duration, and data scarcity, deep learning holds promise for micro-expression studies with good design strategies, and blending end-to-end frameworks with hand-crafted methods offers optimal outcomes.
Date of Award | 2024 |
---|---|
Original language | English |
Supervisor | Roland GOECKE (Supervisor) & Damith HERATH (Supervisor) |