This thesis is focused on the analysis of and drawing inferences from observational data, which at times can be an awkward and complex task. In the context of this thesis, the observational data include financial, environmental, medical and pharmaceutical information, for example, with the data being either single or multilevel in structure. Data can also be univariate, bivariate, multivariate or time series. These different data types have been investigated and the analysis modelled using statistical differences or association methodologies using multilevel modelling, propensity score, copula, elliptical and time series modelling methods. All these methods are common statistical methods, but there are methodological gaps as identified by this thesis. Multilevel modelling in the literature can be confused, reflect myths and focus on random effects modelling as the suitable modelling methodology. This thesis develops a novel generalised approach to multilevel modelling wherein methods and strategies are explored to determine the suitable modelling methodology. Multilevel modelling should be treated as either a fixed effect or random effect methodology, and not all hierarchical datasets require multilevel modelling. Propensity score models can be estimated using both single and multilevel propensity score methods using the strategies developed in this thesis and the developed created indicator variables identified within the multilevel modelling chapter. Additionally, propensity score methods can be used as an Exploratory Data Analysis tool, determining the effectiveness of a survey randomisation process that highlights sampling issues. Further, a novel approach has been developed that enables multilevel modelling of propensity score methods when determining the need for multilevel modelling. Bivariate parametric copula modelling using cumulative density functions and empirical cumulative density functions was explored as copula modelling is sometimes considered within the literature as a ‘black-box’ methodology and can be difficult to understand when providing a valid copula model. This research identifies the importance of and develops dependence modelling to aid in undertaking bivariate copula modelling, enabling the identification of a suitable copula model (where a copula model is valid). The thesis also identifies and develops time series (financial) copula modelling, which is reliant on the analysis of the stabilised time series residuals, is also explored. Further, copula modelling is identified as a novel methodology that can determine where skew or skew-elliptical modelling techniques are required in undertaking elliptical regression modelling. Financial time series modelling within the literature is predominantly focused on prediction. This thesis demonstrates that financial time series modelling requires the time series residual distribution be modelled followed by the usual prediction strategies, with advantages in modelling and identifying these separately. This thesis investigates financial time series distributions and develops novel methods for determining suitable time series distributions and modelling strategies by using regression methods to allow for the volatility within the data to be identified and explored. This thesis presents novel methods and strategies not found in the existing literature that can serve as important aids for researchers in the analysis of a range of observational data. These novel methods have led to the identification of future work that will aid in the analysis of financial data. The findings of this thesis allow for the analysis of non-randomised observational data to be modelled consistent with current scientific method.
Inference and analysis using non-randomised observational data
Dewick, P. (Author). 2024
Student thesis: Doctoral Thesis