TY - GEN
T1 - Multimodal Automatic Acute Pain Recognition Using Facial Expressions and Physiological Signals
AU - Farmani, Jaleh
AU - Giuseppi, Alessandro
AU - Bargshady, Ghazal
AU - Fernandez Rojas, Raul
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Accurate and objective pain assessment is crucial for effective pain management. This paper proposes a novel multimodal deep learning framework for automatic pain detection using a hybrid architecture with feature-level fusion. The framework leverages multimodal data including facial expressions and physiological signals (EDA and ECG) from the BioVid Heat Pain database (Part A). The novel hybrid architecture consists of two streams as stream 1 employs an attention-based CNN-LSTM to extract features from facial expressions videos, capture temporal dependencies, and focus on relevant aspects of the video data, and stream 2 with an LSTM to capture temporal patterns in the physiological signals. The performance of the proposed model was examined in both unimodal and multimodal settings. In a binary classification task distinguishing No Pain from Severe Pain, electrodermal activity (EDA) outperformed all other single data sources, achieving high average accuracy (83.05% for 67 subjects and 82.69% for 87 subjects) and F1-scores (81.66 and 80.18, respectively) using k-fold cross-validation. Additionally, the multimodal setting (Video + EDA) achieved higher accuracy (84.15% for 67 subjects and 83.35% for 87 subjects) and F1-scores (82.86 and 82.36, respectively).
AB - Accurate and objective pain assessment is crucial for effective pain management. This paper proposes a novel multimodal deep learning framework for automatic pain detection using a hybrid architecture with feature-level fusion. The framework leverages multimodal data including facial expressions and physiological signals (EDA and ECG) from the BioVid Heat Pain database (Part A). The novel hybrid architecture consists of two streams as stream 1 employs an attention-based CNN-LSTM to extract features from facial expressions videos, capture temporal dependencies, and focus on relevant aspects of the video data, and stream 2 with an LSTM to capture temporal patterns in the physiological signals. The performance of the proposed model was examined in both unimodal and multimodal settings. In a binary classification task distinguishing No Pain from Severe Pain, electrodermal activity (EDA) outperformed all other single data sources, achieving high average accuracy (83.05% for 67 subjects and 82.69% for 87 subjects) and F1-scores (81.66 and 80.18, respectively) using k-fold cross-validation. Additionally, the multimodal setting (Video + EDA) achieved higher accuracy (84.15% for 67 subjects and 83.35% for 87 subjects) and F1-scores (82.86 and 82.36, respectively).
KW - Facial expression
KW - Hybrid deep learning
KW - Multimodal Analysis
KW - Pain recognition
KW - Physiological signals
UR - http://www.scopus.com/inward/record.url?scp=105009248265&partnerID=8YFLogxK
UR - https://ojs.aut.ac.nz/iconip24/2/index
U2 - 10.1007/978-981-96-6960-8_4
DO - 10.1007/978-981-96-6960-8_4
M3 - Conference contribution
AN - SCOPUS:105009248265
SN - 9789819669592
T3 - Communications in Computer and Information Science
SP - 49
EP - 62
BT - Neural Information Processing
A2 - Mahmud, Mufti
A2 - Doborjeh, Maryam
A2 - Doborjeh, Zohreh
A2 - Wong, Kevin
A2 - Leung, Andrew Chi Sing
A2 - Tanveer, M.
PB - Springer
T2 - 31st International Conference on Neural Information Processing, ICONIP 2024
Y2 - 2 December 2024 through 6 December 2024
ER -