In computer vision, human pose estimation is an active topic and has been used in many applications. In recent years, part based approaches have been effectively utilised to improve the accuracy of estimating human poses. However, these approaches suffer from difficulties in localising body parts in the presence of inter and intra-occlusion. In this thesis, methods to deal with the intra-occlusion (or self-occlusion) problem are proposed. In addition, the impact of the proposed pose estimation approach on obtaining an automatic and improving action recognition accuracy is presented. To start with, prior approaches on pose estimation and action recognition are extensively reviewed and analysed. This analysis highlights the importance of proposing a robust pose estimation algorithm, which can be used to capture the interactions between the different body parts, both spatially and temporally, for robust action recognition. Next, two methods are proposed to handle the self-occlusion problem and to enhance the accuracy and the efficiency of pictorial structure based approaches. These two methods are evaluated on different datasets and show promising results. Furthermore, a general framework is proposed to estimate the human poses from single images and to model the (physical and non-physical) interactions among the body parts. This framework is also extended to model the occluded regions in the inference phase via weighting the scores of the overlapping parts different from the non-overlapping regions. The performance of the proposed framework is evaluated on different datasets as well as across different datasets, showing excellent performance and generalisability. Moreover, a fully automatic approach for reconstructing a 3D pose from a monocular image is proposed. This is performed via enforcing both kinematic and orientation constraints on the 3D predictions of the input image. The proposed model is evaluated on lab-constrained and unconstrained `in the wild' images and presents efficient and improved results. Finally, the effect of improving the human pose estimation on achieving a more accurate and automatic action recognition is shown. The proposed pose estimation approaches are used to encode discriminative features among the body parts. Then, temporal information is collected around the joints of the estimated body parts to represent the feature vectors for each class. This leads to obtaining an automatic and solid action recognition framework, which is tested on a dataset with challenging instances and shows improved accuracy.
|Date of Award||2014|
|Supervisor||Roland Goecke (Supervisor), Girija Chetty (Supervisor) & Michael Wagner (Supervisor)|