Advancements in computer vision research in understanding human actions and activities naturally lead to the next stage of analysing more subtle behaviours. An important application area of computational behaviour analysis is in characterising the behaviour and developmental change in children diagnosed with autism spectrum disorder (ASD). One key early behavioural sign of ASD is deficits in joint attention behaviours. This term is used broadly to refer to the tendency to share ones attention to and interest in objects and events in the environment with others. Another category of atypical behaviour signs that are early markers of autism are self-stimulatory behaviours or stereotyped motor movements. They refer to stereotyped, repetitive movements of body parts or objects, such as arm flapping, body rocking, finger flicking and spinning. There are challenges in reliably estimating features such as a child’s head poses, gaze directions, etc. from a video due to unstructured child behaviours. An alternative hypothesis is to use easily obtainable features such as motion flow and appearance information to develop the models. This study investigates this hypothesis by modelling the estimation of the child’s engagement level in adult-child interactions. The publicly available Multimodal Dyadic Behavior Dataset (MMDB) from Georgia Institute of Technology, USA is used in the experiments. A computational model is developed using motion flow information around upper body regions of a child and the empirical findings are compared with the ground truth accuracies. Due to a child’s dominance in the interaction, the motion flow dynamics characterise the engagement behaviour well. The engagement prediction accuracy with the proposed model is 74.4% validating the applicability of the hypothesis. To test this hypothesis for its generalisability, a similar approach is investigated for modelling self-stimulatory behaviours. Due to a lack of publicly available self-stimulatory datasets, a rich dataset of child behaviour videos is collected and annotated for their self-stimulatory behaviours. This dataset is publicly available for academic purposes. In these videos, a similar set of challenges related to tracking a child’s head and body postures exist and, therefore, motion and appearance features are adopted to develop the computational model successfully. The self-stimulatory behaviours recognition accuracy with the proposed model is 76.3% validating the generalisability of the hypothesis. The child behaviours are expressed using multimodal signals such as audio, video and text. In addition, multiple views such as motion flow, appearance and geometry features, etc. from a single modality can be combined for better representation in sequence learning problems. Long Short-Term Memory (LSTM) has been successfully applied on a number of sequence learning problems but they lack the design flexibility to exploit multi-view relationships. A novel Multi-View LSTM (MV-LSTM) is proposed to model the view-specific and cross-view interactions. A computational model of estimating a child’s engagement level is developed using the MV-LSTM. The recognition performance of the MV-LSTM model has improved over the unimodal approach, indicating the strength of the MV-LSTM for better multi-view learning. Finally, to integrate context into a model, a new context integration framework is proposed. The framework provides flexibility to directly add the context to the LSTM or modulate using a new context gate. The experimental results validate the generalisation of the framework.
|Date of Award||2017|
|Supervisor||Roland Goecke (Supervisor), Elisa Martinez-Marroquin (Supervisor), Michael Wagner (Supervisor) & Michael Breakspear (Supervisor)|