TY - JOUR
T1 - Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk
AU - Dimopoulos, Alexandros C.
AU - Nikolaidou, Mara
AU - Caballero, Francisco Félix
AU - Engchuan, Worrawat
AU - Sanchez-Niubo, Albert
AU - Arndt, Holger
AU - Ayuso-Mateos, José Luis
AU - Haro, Josep Maria
AU - Chatterji, Somnath
AU - Georgousopoulou, Ekavi N.
AU - Pitsavos, Christos
AU - Panagiotakos, Demosthenes B.
N1 - Publisher Copyright:
© 2018 The Author(s).
PY - 2018/12/29
Y1 - 2018/12/29
N2 - Background: The use of Cardiovascular Disease (CVD) risk estimation scores in primary prevention has long been established. However, their performance still remains a matter of concern. The aim of this study was to explore the potential of using ML methodologies on CVD prediction, especially compared to established risk tool, the HellenicSCORE.Methods: Data from the ATTICA prospective study (n = 2020 adults), enrolled during 2001-02 and followed-up in 2011-12 were used. Three different machine-learning classifiers (k-NN, random forest, and decision tree) were trained and evaluated against 10-year CVD incidence, in comparison with the HellenicSCORE tool (a calibration of the ESC SCORE). Training datasets, consisting from 16 variables to only 5 variables, were chosen, with or without bootstrapping, in an attempt to achieve the best overall performance for the machine learning classifiers.Results: Depending on the classifier and the training dataset the outcome varied in efficiency but was comparable between the two methodological approaches. In particular, the HellenicSCORE showed accuracy 85%, specificity 20%, sensitivity 97%, positive predictive value 87%, and negative predictive value 58%, whereas for the machine learning methodologies, accuracy ranged from 65 to 84%, specificity from 46 to 56%, sensitivity from 67 to 89%, positive predictive value from 89 to 91%, and negative predictive value from 24 to 45%; random forest gave the best results, while the k-NN gave the poorest results.Conclusions: The alternative approach of machine learning classification produced results comparable to that of risk prediction scores and, thus, it can be used as a method of CVD prediction, taking into consideration the advantages that machine learning methodologies may offer.
AB - Background: The use of Cardiovascular Disease (CVD) risk estimation scores in primary prevention has long been established. However, their performance still remains a matter of concern. The aim of this study was to explore the potential of using ML methodologies on CVD prediction, especially compared to established risk tool, the HellenicSCORE.Methods: Data from the ATTICA prospective study (n = 2020 adults), enrolled during 2001-02 and followed-up in 2011-12 were used. Three different machine-learning classifiers (k-NN, random forest, and decision tree) were trained and evaluated against 10-year CVD incidence, in comparison with the HellenicSCORE tool (a calibration of the ESC SCORE). Training datasets, consisting from 16 variables to only 5 variables, were chosen, with or without bootstrapping, in an attempt to achieve the best overall performance for the machine learning classifiers.Results: Depending on the classifier and the training dataset the outcome varied in efficiency but was comparable between the two methodological approaches. In particular, the HellenicSCORE showed accuracy 85%, specificity 20%, sensitivity 97%, positive predictive value 87%, and negative predictive value 58%, whereas for the machine learning methodologies, accuracy ranged from 65 to 84%, specificity from 46 to 56%, sensitivity from 67 to 89%, positive predictive value from 89 to 91%, and negative predictive value from 24 to 45%; random forest gave the best results, while the k-NN gave the poorest results.Conclusions: The alternative approach of machine learning classification produced results comparable to that of risk prediction scores and, thus, it can be used as a method of CVD prediction, taking into consideration the advantages that machine learning methodologies may offer.
KW - Cardiovascular disease
KW - Machine learning
KW - Model performance
KW - Risk prediction
KW - Blood Pressure/physiology
KW - Reproducibility of Results
KW - Prospective Studies
KW - Models, Cardiovascular
KW - Humans
KW - Middle Aged
KW - Risk Factors
KW - Male
KW - Machine Learning
KW - Algorithms
KW - Sensitivity and Specificity
KW - Adult
KW - Female
KW - Risk Assessment/methods
KW - Cardiovascular Diseases/diagnosis
UR - http://www.scopus.com/inward/record.url?scp=85059285977&partnerID=8YFLogxK
U2 - 10.1186/s12874-018-0644-1
DO - 10.1186/s12874-018-0644-1
M3 - Article
C2 - 30594138
AN - SCOPUS:85059285977
SN - 1471-2288
VL - 18
SP - 1
EP - 11
JO - BMC Medical Research Methodology
JF - BMC Medical Research Methodology
IS - 1
M1 - 179
ER -