Decision support framework for cardiovascular disease prediction using machine learning

  • Nitten Rajliwall

    Student thesis: Professional Doctorate


    Clinical decision making is an important and frequent task, which physicians make in their daily clinical practice. Conventionally, physicians adopt a cognitive predictive modelling process (i.e., knowledge and experience learnt from experience, their research, related literature, patient cases, etc.) for anticipating or ascertaining health problems based on clinical risk factors, that deem to be the most salient. However, with the inundation of health data, from EHR system, wearable devices, and other systems for monitoring vital parameters, it has become difficult for physicians to make sense of this massive data, particularly, due to confounding and complex characteristics of chronic diseases, and there is a need for more effective clinical prediction approaches to address these challenges.

    Given the paramount importance of predictive models for managing chronic disease, cardiovascular diseases in particular, this thesis proposes a novel computational predictive modelling framework, based on innovative machine learning and data science approaches that can aid in clinical decision support. The focus of the proposed predictive modelling framework is on interpretable machine learning approaches that consist of interpretable models based on shallow machine learning techniques, such as those based on linear regression and decision trees and their variants, and model-agnostic approaches based on neural networks and deep learning methods but enhanced with appropriate feature engineering and post-hoc explainability. These approaches allow disease prediction models to be deployed in complex clinical settings, including under remote, extreme, and low-resource environments, where data could be small, big, or massive and has several inadequacies in terms of data quality, noise, or missing data. The availability of interpretable models, and model-agnostic approaches enhanced with explainable aspects are important for physicians and medical professionals, as it will increase transparency, trust and confidence in the decision support provided by computer based algorithmic models.

    This thesis aims to address the research gap that exists in the current ML/AI based disease detection models, particularly, the lack of robust, objective, explainable, interpretable and trustworthy inference available from the computer based decision support tools, with a majority of the performance metrics reported from computer based tools have been limited to quantitative measures such as accuracy, precision, recall, F-measure, AUC, ROC, without any detailed qualitative metrics, that provide insight into how the computer has arrived at a decision, and ability to explain the decision making logic, eliciting trust from the stakeholders using the system. This could be due to the problem that most of the current ML/AI tools were built using mathematically rigorous constructs, designed around black box approaches, which are hard to interpret and explain, and hence the decisions provided by them appear to be coming from a black box, offering little explanation on decision arrived.

    The research proposed in this thesis is aimed at the development of a breakthrough explainable predictive modelling framework, based on innovative ML/AI algorithms for building CVD disease detection models. The proposed computation framework provides an intelligent and interpretable holistic analytics platform with improved prediction accuracy, and improved interpretability and explainability. The proposed innovation and development can help drive the healthcare system to one that is more patient-centred, and trustworthy, with potential to be tailored for several diseases such as cancer, cardiovascular disease, asthma, traumatic brain injury, dementia, and diabetes. The outcomes of this research based on innovative findings can serve as an example – that the availability of better computer-based decision support tools, with novel computational strategies, which can address a patient’s unique clinical/genetic characteristics, can result in better characterization of diseases and at the same time redefine therapeutic strategies. Some of the key contributions from this research include:

    • Novel disease detection models based on traditional shallow machine learning algorithms, particularly those based on decision trees and their variants. These algorithms have shown to be inherently interpretable and accurate white box models and can serve as the baseline for comparing with previous models proposed in the literature.
    • Innovative disease detection models based on model agnostic algorithms, such as deep learning networks, but augmented with appropriate pre- processing and post-processing stages to provide better interpretability and explainability and eventually make them an efficient white box model.

    For an objective comparison of the methods proposed in each of the above stages, several publicly available benchmark clinical datasets, including Cleveland dataset, NHANES dataset and Framingham Heart Study/CHS dataset were used for model building and experimental validation.

    Although Cardiovascular disease has been selected as the use case and disease under investigation, since it has led to an alarming increase in the burden of disease, almost at the epidemic levels, and is a major health concern in today’s world, the findings from this research can lead to meaningful and significant impact towards improved self-management of chronic non-communicable diseases and make a significant contribution towards better public health management.
    Date of Award2022
    Original languageEnglish

    Cite this