Sociodemographic indicators of health status using a machine learning approach and data from the english longitudinal study of aging (ELSA)

Worrawat Engchuan, Alexandros C. Dimopoulos, Stefanos Tyrovolas, Francisco Félix Caballero, Albert Sanchez-Niubo, Holger Arndt, Jose Luis Ayuso-Mateos, Josep Maria Haro, Somnath Chatterji, Demosthenes B. Panagiotakos

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background: Studies on the effects of sociodemographic factors on health in aging now include the use of statistical models and machine learning. The aim of this study was to evaluate the determinants of health in aging using machine learning methods and to compare the accuracy with traditional methods. Material/Methods: The health status of 6,209 adults, age <65 years (n=1,585), 65–79 years (n=3,267), and >80 years (n=1,357) were measured using an established health metric (0–100) that incorporated physical function and activities of daily living (ADL). Data from the English Longitudinal Study of Ageing (ELSA) included socio-economic and sociodemographic characteristics and history of falls. Health-trend and personal-fitted variables were generated as predictors of health metrics using three machine learning methods, random forest (RF), deep learning (DL) and the linear model (LM), with calculation of the percentage increase in mean square error (%IncMSE) as a measure of the importance of a given predictive variable, when the variable was removed from the model. Results: Health-trend, physical activity, and personal-fitted variables were the main predictors of health, with the%incMSE of 85.76%, 63.40%, and 46.71%, respectively. Age, employment status, alcohol consumption, and household income had the%incMSE of 20.40%, 20.10%, 16.94%, and 13.61%, respectively. Performance of the RF method was similar to the traditional LM (p=0.7), but RF significantly outperformed DL (p=0.006). Conclusions: Machine learning methods can be used to evaluate multidimensional longitudinal health data and may provide accurate results with fewer requirements when compared with traditional statistical modeling.

Original languageEnglish
Pages (from-to)1994-2001
Number of pages8
JournalMedical Science Monitor
Volume25
DOIs
Publication statusPublished - 17 Mar 2019

Fingerprint

Health Status Indicators
Longitudinal Studies
Health
Linear Models
Learning
Machine Learning
Statistical Models
Activities of Daily Living
Alcohol Drinking
Health Status
Economics

Cite this

Engchuan, W., Dimopoulos, A. C., Tyrovolas, S., Caballero, F. F., Sanchez-Niubo, A., Arndt, H., ... Panagiotakos, D. B. (2019). Sociodemographic indicators of health status using a machine learning approach and data from the english longitudinal study of aging (ELSA). Medical Science Monitor, 25, 1994-2001. https://doi.org/10.12659/MSM.913283
Engchuan, Worrawat ; Dimopoulos, Alexandros C. ; Tyrovolas, Stefanos ; Caballero, Francisco Félix ; Sanchez-Niubo, Albert ; Arndt, Holger ; Ayuso-Mateos, Jose Luis ; Haro, Josep Maria ; Chatterji, Somnath ; Panagiotakos, Demosthenes B. / Sociodemographic indicators of health status using a machine learning approach and data from the english longitudinal study of aging (ELSA). In: Medical Science Monitor. 2019 ; Vol. 25. pp. 1994-2001.
@article{9cb39cab486e4fd7b03cb1c9e1c3ada9,
title = "Sociodemographic indicators of health status using a machine learning approach and data from the english longitudinal study of aging (ELSA)",
abstract = "Background: Studies on the effects of sociodemographic factors on health in aging now include the use of statistical models and machine learning. The aim of this study was to evaluate the determinants of health in aging using machine learning methods and to compare the accuracy with traditional methods. Material/Methods: The health status of 6,209 adults, age <65 years (n=1,585), 65–79 years (n=3,267), and >80 years (n=1,357) were measured using an established health metric (0–100) that incorporated physical function and activities of daily living (ADL). Data from the English Longitudinal Study of Ageing (ELSA) included socio-economic and sociodemographic characteristics and history of falls. Health-trend and personal-fitted variables were generated as predictors of health metrics using three machine learning methods, random forest (RF), deep learning (DL) and the linear model (LM), with calculation of the percentage increase in mean square error ({\%}IncMSE) as a measure of the importance of a given predictive variable, when the variable was removed from the model. Results: Health-trend, physical activity, and personal-fitted variables were the main predictors of health, with the{\%}incMSE of 85.76{\%}, 63.40{\%}, and 46.71{\%}, respectively. Age, employment status, alcohol consumption, and household income had the{\%}incMSE of 20.40{\%}, 20.10{\%}, 16.94{\%}, and 13.61{\%}, respectively. Performance of the RF method was similar to the traditional LM (p=0.7), but RF significantly outperformed DL (p=0.006). Conclusions: Machine learning methods can be used to evaluate multidimensional longitudinal health data and may provide accurate results with fewer requirements when compared with traditional statistical modeling.",
keywords = "Artificial intelligence, Data interpretation, statistical, Decision support techniques, Socioeconomic factors, Aging/genetics, Humans, Middle Aged, Male, Models, Statistical, Machine Learning, Socioeconomic Factors, Aged, 80 and over, Adult, Female, Aged, Health Status, Forecasting/methods, Longitudinal Studies",
author = "Worrawat Engchuan and Dimopoulos, {Alexandros C.} and Stefanos Tyrovolas and Caballero, {Francisco F{\'e}lix} and Albert Sanchez-Niubo and Holger Arndt and Ayuso-Mateos, {Jose Luis} and Haro, {Josep Maria} and Somnath Chatterji and Panagiotakos, {Demosthenes B.}",
year = "2019",
month = "3",
day = "17",
doi = "10.12659/MSM.913283",
language = "English",
volume = "25",
pages = "1994--2001",
journal = "Medical Science Monitor",
issn = "1234-1010",
publisher = "International Scientific Literature Inc.",

}

Engchuan, W, Dimopoulos, AC, Tyrovolas, S, Caballero, FF, Sanchez-Niubo, A, Arndt, H, Ayuso-Mateos, JL, Haro, JM, Chatterji, S & Panagiotakos, DB 2019, 'Sociodemographic indicators of health status using a machine learning approach and data from the english longitudinal study of aging (ELSA)', Medical Science Monitor, vol. 25, pp. 1994-2001. https://doi.org/10.12659/MSM.913283

Sociodemographic indicators of health status using a machine learning approach and data from the english longitudinal study of aging (ELSA). / Engchuan, Worrawat; Dimopoulos, Alexandros C.; Tyrovolas, Stefanos; Caballero, Francisco Félix; Sanchez-Niubo, Albert; Arndt, Holger; Ayuso-Mateos, Jose Luis; Haro, Josep Maria; Chatterji, Somnath; Panagiotakos, Demosthenes B.

In: Medical Science Monitor, Vol. 25, 17.03.2019, p. 1994-2001.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Sociodemographic indicators of health status using a machine learning approach and data from the english longitudinal study of aging (ELSA)

AU - Engchuan, Worrawat

AU - Dimopoulos, Alexandros C.

AU - Tyrovolas, Stefanos

AU - Caballero, Francisco Félix

AU - Sanchez-Niubo, Albert

AU - Arndt, Holger

AU - Ayuso-Mateos, Jose Luis

AU - Haro, Josep Maria

AU - Chatterji, Somnath

AU - Panagiotakos, Demosthenes B.

PY - 2019/3/17

Y1 - 2019/3/17

N2 - Background: Studies on the effects of sociodemographic factors on health in aging now include the use of statistical models and machine learning. The aim of this study was to evaluate the determinants of health in aging using machine learning methods and to compare the accuracy with traditional methods. Material/Methods: The health status of 6,209 adults, age <65 years (n=1,585), 65–79 years (n=3,267), and >80 years (n=1,357) were measured using an established health metric (0–100) that incorporated physical function and activities of daily living (ADL). Data from the English Longitudinal Study of Ageing (ELSA) included socio-economic and sociodemographic characteristics and history of falls. Health-trend and personal-fitted variables were generated as predictors of health metrics using three machine learning methods, random forest (RF), deep learning (DL) and the linear model (LM), with calculation of the percentage increase in mean square error (%IncMSE) as a measure of the importance of a given predictive variable, when the variable was removed from the model. Results: Health-trend, physical activity, and personal-fitted variables were the main predictors of health, with the%incMSE of 85.76%, 63.40%, and 46.71%, respectively. Age, employment status, alcohol consumption, and household income had the%incMSE of 20.40%, 20.10%, 16.94%, and 13.61%, respectively. Performance of the RF method was similar to the traditional LM (p=0.7), but RF significantly outperformed DL (p=0.006). Conclusions: Machine learning methods can be used to evaluate multidimensional longitudinal health data and may provide accurate results with fewer requirements when compared with traditional statistical modeling.

AB - Background: Studies on the effects of sociodemographic factors on health in aging now include the use of statistical models and machine learning. The aim of this study was to evaluate the determinants of health in aging using machine learning methods and to compare the accuracy with traditional methods. Material/Methods: The health status of 6,209 adults, age <65 years (n=1,585), 65–79 years (n=3,267), and >80 years (n=1,357) were measured using an established health metric (0–100) that incorporated physical function and activities of daily living (ADL). Data from the English Longitudinal Study of Ageing (ELSA) included socio-economic and sociodemographic characteristics and history of falls. Health-trend and personal-fitted variables were generated as predictors of health metrics using three machine learning methods, random forest (RF), deep learning (DL) and the linear model (LM), with calculation of the percentage increase in mean square error (%IncMSE) as a measure of the importance of a given predictive variable, when the variable was removed from the model. Results: Health-trend, physical activity, and personal-fitted variables were the main predictors of health, with the%incMSE of 85.76%, 63.40%, and 46.71%, respectively. Age, employment status, alcohol consumption, and household income had the%incMSE of 20.40%, 20.10%, 16.94%, and 13.61%, respectively. Performance of the RF method was similar to the traditional LM (p=0.7), but RF significantly outperformed DL (p=0.006). Conclusions: Machine learning methods can be used to evaluate multidimensional longitudinal health data and may provide accurate results with fewer requirements when compared with traditional statistical modeling.

KW - Artificial intelligence

KW - Data interpretation, statistical

KW - Decision support techniques

KW - Socioeconomic factors

KW - Aging/genetics

KW - Humans

KW - Middle Aged

KW - Male

KW - Models, Statistical

KW - Machine Learning

KW - Socioeconomic Factors

KW - Aged, 80 and over

KW - Adult

KW - Female

KW - Aged

KW - Health Status

KW - Forecasting/methods

KW - Longitudinal Studies

UR - http://www.scopus.com/inward/record.url?scp=85063258613&partnerID=8YFLogxK

U2 - 10.12659/MSM.913283

DO - 10.12659/MSM.913283

M3 - Article

VL - 25

SP - 1994

EP - 2001

JO - Medical Science Monitor

JF - Medical Science Monitor

SN - 1234-1010

ER -