TY - JOUR
T1 - Coronavirus research topics, tracking twenty years of research
AU - Aryani, Amir
AU - Wang, Jingbo
AU - Salvador-Carulla, Luis
AU - Woo, Jihoon
AU - Cheung, Cathy P.W.
AU - Wu, Zhuochen
AU - Yin, Hui
AU - Xiao, Junhua
AU - Lambert, Elisabeth A.
AU - Howitt, Jason
AU - Davidson, Jean M.
AU - Yoong, Serene
AU - Dixon, John B.
AU - Climie, Rachel E.
AU - Salinas-Perez, Jose A.
AU - Bagheri, Nasser
AU - Santiago, Celine
AU - Williams, Joanne
AU - Wickramasinghe, Nilmini
AU - Ng, Leo
AU - Zwack, Clara C.
AU - Lambert, Gavin W.
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025
Y1 - 2025
N2 - Research publications aimed at understanding the various aspects of Coronaviruses, particularly COVID-19, have significantly shaped our knowledge base. While the urgency to monitor COVID-19 in real-time has decreased, the continual influx of new research of monthly articles underscores the importance of systematic review and analysis to deepen our understanding of the pandemic’s broad impact. To explore research trends and innovations in this space, we developed a pipeline using natural language processing techniques. This pipeline systematically catalogues and synthesises the vast array of research articles, leading to the creation of a dataset with more than eight hundred thousand articles from July 2002 to May 2024. This paper describes the content of this dataset and provides the necessary information to make this dataset accessible and reusable for future research. Our approach aggregates and organises global research related to Coronaviruses into thematic clusters such as vaccine development, public health strategies, infection mechanisms, mental health issues, and economic consequences. Also, we have leveraged the contribution of health experts to review and revise the dataset.
AB - Research publications aimed at understanding the various aspects of Coronaviruses, particularly COVID-19, have significantly shaped our knowledge base. While the urgency to monitor COVID-19 in real-time has decreased, the continual influx of new research of monthly articles underscores the importance of systematic review and analysis to deepen our understanding of the pandemic’s broad impact. To explore research trends and innovations in this space, we developed a pipeline using natural language processing techniques. This pipeline systematically catalogues and synthesises the vast array of research articles, leading to the creation of a dataset with more than eight hundred thousand articles from July 2002 to May 2024. This paper describes the content of this dataset and provides the necessary information to make this dataset accessible and reusable for future research. Our approach aggregates and organises global research related to Coronaviruses into thematic clusters such as vaccine development, public health strategies, infection mechanisms, mental health issues, and economic consequences. Also, we have leveraged the contribution of health experts to review and revise the dataset.
UR - http://www.scopus.com/inward/record.url?scp=105007880538&partnerID=8YFLogxK
U2 - 10.1038/s41597-025-04992-z
DO - 10.1038/s41597-025-04992-z
M3 - Article
C2 - 40494890
AN - SCOPUS:105007880538
SN - 2052-4463
VL - 12
SP - 1
EP - 17
JO - Scientific Data
JF - Scientific Data
IS - 1
M1 - 978
ER -