Cognitive workload estimation under extended reality

  • Isuri Wijechandra Manawadu

    Student thesis: Master's Thesis

    Abstract

    Cognitive workload (CWL) estimation has emerged as a critical research focus in domains where task performance and safety are closely tied to human attentional and executive capacities. In contemporary high-stakes environments, such as aviation, autonomous driving, healthcare, and virtual training, accurately identifying an individual’s cognitive state in real-time is vital for optimising system responsiveness, reducing errors, and preventing mental fatigue or overload. Recent advances in neuroimaging and artificial intelligence have enabled non-invasive, sensor-based approaches to estimate mental workload by capturing neurophysiological patterns associated with effortful cognitive processing. Among these, electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) have gained prominence due to their complementary strengths: EEG offers excellent temporal resolution, while fNIRS provides spatially localised haemodynamic measurements of cortical activity. However, the full potential of combining these modalities in real-time workload estimation, particularly within immersive, extended reality (XR) environments, remains underexplored in both technical and applied settings.
    This thesis begins with a comprehensive review of existing literature to identify prevailing research gaps in the assessment of cognitive workload within extended reality environments. In particular, it highlights the scarcity of studies that employ multimodal neural sensing and multi-domain feature analysis for workload modelling. To address these limitations, this thesis proposes a multimodal, multi-domain machine learning framework for cognitive workload classification using EEG and fNIRS data collected under Stroop-based experimental conditions. The Stroop paradigm, characterised by the need to resolve conflicting versus congruent information, serves as an ecologically valid model of everyday cognitive control demands, reflecting the type of information processing challenges encountered in real-world scenarios. Leveraging a publicly available, synchronised EEG–fNIRS dataset, this study extracts and evaluates neural features across time, frequency, and wavelet domains to estimate workload levels induced by incongruent (high workload) and neutral (low workload) Stroop task conditions. The methodological pipeline includes modality-specific preprocessing techniques (e.g., independent component analysis for EEG, optical density transformation for fNIRS), domain-specific feature extraction, feature selection using the Gini importance index, and extensive cross-validation based classification using seven machine learning algorithms (support vector machine with linear/radial basis function kernels, k nearest neighbour with k=1,3,5, linear discriminant analysis, decision tree, and random forest classifiers).
    Unimodal and multimodal feature sets are rigorously benchmarked for their discriminative power across analytical domains. The results reveal that EEG-derived frequency domain features yield the highest unimodal accuracy of 88.7% (kNN classifier), particularly highlighting the role of frontal theta and alpha oscillations in prefrontal regions (notably FP1 and FP2) during cognitively demanding tasks. In contrast, fNIRS-derived features, particularly from the deoxygenated haemoglobin signal, demonstrate superior classification performance in the wavelet domain, reaching 82.9% accuracy using random forest classifiers. These findings underscore the spatial specificity of vascular responses under workload-induced stress, particularly around fronto-central and sensorimotor regions such as FC3, C3, and C4.
    Importantly, the multimodal EEG–fNIRS fusion approach significantly outperforms all unimodal models, achieving peak classification accuracy of 97.2% and F1 score of 97.3% using decision tree classifiers with cross-domain feature fusion. This result strongly supports the hypothesis that fusing temporally precise EEG signals with spatially informative fNIRS signals provides a richer representation of cognitive states than either modality alone. Further analysis of feature importance across domains and modalities confirms the critical role of prefrontal and centro-parietal brain regions, particularly electrodes and channels around FP1, FP2, FC3, and C1, as robust biomarkers for cognitive workload classification. These findings are consistent with known neurophysiological correlates of executive function, attentional allocation, and task difficulty processing.
    In addition to performance benchmarking, this thesis also presents a critical discussion on methodological limitations, including the restricted spatial coverage of fNIRS, inter-subject variability in workload responses, the binary nature of the classification labels, and the lack of ecological realism due to the laboratory setting. To address these, future work is proposed in several directions: extending fNIRS channel coverage to parietal and temporal lobes, adopting continuous workload estimation models, incorporating deep learning architectures for spatio-temporal modelling, and validating the developed framework in immersive, real-world XR environments that present dynamic, context-dependent cognitive demands.
    Overall, this thesis contributes to the growing body of literature on neuroadaptive system design and cognitive state monitoring by (i) identifying key research gaps through an extensive review of current methodologies in unimodal and multimodal cognitive workload estimation highlighting limitations in modality-specific signal interpretation, underexplored multi-domain feature extraction, and the absence of robust multimodal fusion strategies, (ii) demonstrating the feasibility and efficacy of tri-domain feature extraction across EEG and fNIRS modalities, (iii) establishing a multimodal fusion pipeline with superior classification capabilities, and (iv) identifying machine learning domain and region-specific biomarkers for cognitive workload under controlled task settings. By offering a comprehensive empirical evaluation and a generalisable methodological framework, this study lays the groundwork for real-time, non-invasive, and scalable CWL estimation systems with applications in human–computer interaction, safety-critical monitoring, adaptive learning, and XR-based simulations. The convergence of neuroimaging, machine learning, and multimodal signal fusion explored in this thesis represents a significant advancement toward context-aware, cognitively adaptive technologies capable of supporting human operators in complex and demanding environments.
    Date of Award2025
    Original languageEnglish
    SupervisorRamanathan Subramanian (Supervisor) & Raul Fernandez Rojas (Supervisor)

    Cite this

    '