Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk

Alexandros C. Dimopoulos, Mara Nikolaidou, Francisco Félix Caballero, Worrawat Engchuan, Albert Sanchez-Niubo, Holger Arndt, José Luis Ayuso-Mateos, Josep Maria Haro, Somnath Chatterji, Ekavi N. Georgousopoulou, Christos Pitsavos, Demosthenes B. Panagiotakos

Research output: Contribution to journalArticle

2 Downloads (Pure)

Abstract

Background: The use of Cardiovascular Disease (CVD) risk estimation scores in primary prevention has long been established. However, their performance still remains a matter of concern. The aim of this study was to explore the potential of using ML methodologies on CVD prediction, especially compared to established risk tool, the HellenicSCORE.

Methods: Data from the ATTICA prospective study (n = 2020 adults), enrolled during 2001-02 and followed-up in 2011-12 were used. Three different machine-learning classifiers (k-NN, random forest, and decision tree) were trained and evaluated against 10-year CVD incidence, in comparison with the HellenicSCORE tool (a calibration of the ESC SCORE). Training datasets, consisting from 16 variables to only 5 variables, were chosen, with or without bootstrapping, in an attempt to achieve the best overall performance for the machine learning classifiers.

Results: Depending on the classifier and the training dataset the outcome varied in efficiency but was comparable between the two methodological approaches. In particular, the HellenicSCORE showed accuracy 85%, specificity 20%, sensitivity 97%, positive predictive value 87%, and negative predictive value 58%, whereas for the machine learning methodologies, accuracy ranged from 65 to 84%, specificity from 46 to 56%, sensitivity from 67 to 89%, positive predictive value from 89 to 91%, and negative predictive value from 24 to 45%; random forest gave the best results, while the k-NN gave the poorest results.

Conclusions: The alternative approach of machine learning classification produced results comparable to that of risk prediction scores and, thus, it can be used as a method of CVD prediction, taking into consideration the advantages that machine learning methodologies may offer.

Original languageEnglish
Article number179
Pages (from-to)1-11
Number of pages11
JournalBMC Medical Research Methodology
Volume18
Issue number1
DOIs
Publication statusPublished - 29 Dec 2018

Fingerprint

Cardiovascular Diseases
Decision Trees
Primary Prevention
Calibration
Machine Learning
Prospective Studies
Sensitivity and Specificity
Incidence
Datasets
Forests

Cite this

Dimopoulos, A. C., Nikolaidou, M., Caballero, F. F., Engchuan, W., Sanchez-Niubo, A., Arndt, H., ... Panagiotakos, D. B. (2018). Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk. BMC Medical Research Methodology, 18(1), 1-11. [179]. https://doi.org/10.1186/s12874-018-0644-1
Dimopoulos, Alexandros C. ; Nikolaidou, Mara ; Caballero, Francisco Félix ; Engchuan, Worrawat ; Sanchez-Niubo, Albert ; Arndt, Holger ; Ayuso-Mateos, José Luis ; Haro, Josep Maria ; Chatterji, Somnath ; Georgousopoulou, Ekavi N. ; Pitsavos, Christos ; Panagiotakos, Demosthenes B. / Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk. In: BMC Medical Research Methodology. 2018 ; Vol. 18, No. 1. pp. 1-11.
@article{41bb087ab6264b44b57b48b1e16b8f12,
title = "Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk",
abstract = "Background: The use of Cardiovascular Disease (CVD) risk estimation scores in primary prevention has long been established. However, their performance still remains a matter of concern. The aim of this study was to explore the potential of using ML methodologies on CVD prediction, especially compared to established risk tool, the HellenicSCORE.Methods: Data from the ATTICA prospective study (n = 2020 adults), enrolled during 2001-02 and followed-up in 2011-12 were used. Three different machine-learning classifiers (k-NN, random forest, and decision tree) were trained and evaluated against 10-year CVD incidence, in comparison with the HellenicSCORE tool (a calibration of the ESC SCORE). Training datasets, consisting from 16 variables to only 5 variables, were chosen, with or without bootstrapping, in an attempt to achieve the best overall performance for the machine learning classifiers.Results: Depending on the classifier and the training dataset the outcome varied in efficiency but was comparable between the two methodological approaches. In particular, the HellenicSCORE showed accuracy 85{\%}, specificity 20{\%}, sensitivity 97{\%}, positive predictive value 87{\%}, and negative predictive value 58{\%}, whereas for the machine learning methodologies, accuracy ranged from 65 to 84{\%}, specificity from 46 to 56{\%}, sensitivity from 67 to 89{\%}, positive predictive value from 89 to 91{\%}, and negative predictive value from 24 to 45{\%}; random forest gave the best results, while the k-NN gave the poorest results.Conclusions: The alternative approach of machine learning classification produced results comparable to that of risk prediction scores and, thus, it can be used as a method of CVD prediction, taking into consideration the advantages that machine learning methodologies may offer.",
keywords = "Cardiovascular disease, Machine learning, Model performance, Risk prediction",
author = "Dimopoulos, {Alexandros C.} and Mara Nikolaidou and Caballero, {Francisco F{\'e}lix} and Worrawat Engchuan and Albert Sanchez-Niubo and Holger Arndt and Ayuso-Mateos, {Jos{\'e} Luis} and Haro, {Josep Maria} and Somnath Chatterji and Georgousopoulou, {Ekavi N.} and Christos Pitsavos and Panagiotakos, {Demosthenes B.}",
year = "2018",
month = "12",
day = "29",
doi = "10.1186/s12874-018-0644-1",
language = "English",
volume = "18",
pages = "1--11",
journal = "BMC Medical Research Methodology",
issn = "1471-2288",
publisher = "BioMed Central",
number = "1",

}

Dimopoulos, AC, Nikolaidou, M, Caballero, FF, Engchuan, W, Sanchez-Niubo, A, Arndt, H, Ayuso-Mateos, JL, Haro, JM, Chatterji, S, Georgousopoulou, EN, Pitsavos, C & Panagiotakos, DB 2018, 'Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk', BMC Medical Research Methodology, vol. 18, no. 1, 179, pp. 1-11. https://doi.org/10.1186/s12874-018-0644-1

Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk. / Dimopoulos, Alexandros C.; Nikolaidou, Mara; Caballero, Francisco Félix; Engchuan, Worrawat; Sanchez-Niubo, Albert; Arndt, Holger; Ayuso-Mateos, José Luis; Haro, Josep Maria; Chatterji, Somnath; Georgousopoulou, Ekavi N.; Pitsavos, Christos; Panagiotakos, Demosthenes B.

In: BMC Medical Research Methodology, Vol. 18, No. 1, 179, 29.12.2018, p. 1-11.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk

AU - Dimopoulos, Alexandros C.

AU - Nikolaidou, Mara

AU - Caballero, Francisco Félix

AU - Engchuan, Worrawat

AU - Sanchez-Niubo, Albert

AU - Arndt, Holger

AU - Ayuso-Mateos, José Luis

AU - Haro, Josep Maria

AU - Chatterji, Somnath

AU - Georgousopoulou, Ekavi N.

AU - Pitsavos, Christos

AU - Panagiotakos, Demosthenes B.

PY - 2018/12/29

Y1 - 2018/12/29

N2 - Background: The use of Cardiovascular Disease (CVD) risk estimation scores in primary prevention has long been established. However, their performance still remains a matter of concern. The aim of this study was to explore the potential of using ML methodologies on CVD prediction, especially compared to established risk tool, the HellenicSCORE.Methods: Data from the ATTICA prospective study (n = 2020 adults), enrolled during 2001-02 and followed-up in 2011-12 were used. Three different machine-learning classifiers (k-NN, random forest, and decision tree) were trained and evaluated against 10-year CVD incidence, in comparison with the HellenicSCORE tool (a calibration of the ESC SCORE). Training datasets, consisting from 16 variables to only 5 variables, were chosen, with or without bootstrapping, in an attempt to achieve the best overall performance for the machine learning classifiers.Results: Depending on the classifier and the training dataset the outcome varied in efficiency but was comparable between the two methodological approaches. In particular, the HellenicSCORE showed accuracy 85%, specificity 20%, sensitivity 97%, positive predictive value 87%, and negative predictive value 58%, whereas for the machine learning methodologies, accuracy ranged from 65 to 84%, specificity from 46 to 56%, sensitivity from 67 to 89%, positive predictive value from 89 to 91%, and negative predictive value from 24 to 45%; random forest gave the best results, while the k-NN gave the poorest results.Conclusions: The alternative approach of machine learning classification produced results comparable to that of risk prediction scores and, thus, it can be used as a method of CVD prediction, taking into consideration the advantages that machine learning methodologies may offer.

AB - Background: The use of Cardiovascular Disease (CVD) risk estimation scores in primary prevention has long been established. However, their performance still remains a matter of concern. The aim of this study was to explore the potential of using ML methodologies on CVD prediction, especially compared to established risk tool, the HellenicSCORE.Methods: Data from the ATTICA prospective study (n = 2020 adults), enrolled during 2001-02 and followed-up in 2011-12 were used. Three different machine-learning classifiers (k-NN, random forest, and decision tree) were trained and evaluated against 10-year CVD incidence, in comparison with the HellenicSCORE tool (a calibration of the ESC SCORE). Training datasets, consisting from 16 variables to only 5 variables, were chosen, with or without bootstrapping, in an attempt to achieve the best overall performance for the machine learning classifiers.Results: Depending on the classifier and the training dataset the outcome varied in efficiency but was comparable between the two methodological approaches. In particular, the HellenicSCORE showed accuracy 85%, specificity 20%, sensitivity 97%, positive predictive value 87%, and negative predictive value 58%, whereas for the machine learning methodologies, accuracy ranged from 65 to 84%, specificity from 46 to 56%, sensitivity from 67 to 89%, positive predictive value from 89 to 91%, and negative predictive value from 24 to 45%; random forest gave the best results, while the k-NN gave the poorest results.Conclusions: The alternative approach of machine learning classification produced results comparable to that of risk prediction scores and, thus, it can be used as a method of CVD prediction, taking into consideration the advantages that machine learning methodologies may offer.

KW - Cardiovascular disease

KW - Machine learning

KW - Model performance

KW - Risk prediction

UR - http://www.scopus.com/inward/record.url?scp=85059285977&partnerID=8YFLogxK

U2 - 10.1186/s12874-018-0644-1

DO - 10.1186/s12874-018-0644-1

M3 - Article

VL - 18

SP - 1

EP - 11

JO - BMC Medical Research Methodology

JF - BMC Medical Research Methodology

SN - 1471-2288

IS - 1

M1 - 179

ER -

Dimopoulos AC, Nikolaidou M, Caballero FF, Engchuan W, Sanchez-Niubo A, Arndt H et al. Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk. BMC Medical Research Methodology. 2018 Dec 29;18(1):1-11. 179. https://doi.org/10.1186/s12874-018-0644-1