Automatic detection of patients with invasive fungal disease from free-text computed tomography (CT) scans

David Martinez, Michelle Ananda-Rajah, Monica Slavin, Karin Thursky, Lawrence Cavedon

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Background: Invasive fungal diseases (IFDs) are associated with considerable health and economic costs. Surveillance of the more diagnostically challenging invasive fungal diseases, specifically of the sino-pulmonary system, is not feasible for many hospitals because case finding is a costly and labour intensive exercise. We developed text classifiers for detecting such IFDs from free-text radiology (CT) reports, using machine-learning techniques. Method: We obtained free-text reports of CT scans performed over a specific hospitalisation period (2003-2011), for 264 IFD and 289 control patients from three tertiary hospitals. We analysed IFD evidence at patient, report, and sentence levels. Three infectious disease experts annotated the reports of 73 IFD-positive patients for language suggestive of IFD at sentence level, and graded the sentences as to whether they suggested or excluded the presence of IFD. Reliable agreement between annotators was obtained and this was used as training data for our classifiers. We tested a variety of Machine Learning (ML), rule based, and hybrid systems, with feature types including bags of words, bags of phrases, and bags of concepts, as well as report-level structured features. Evaluation was carried out over a robust framework with separate Development and Held-Out datasets. Results: The best systems (using Support Vector Machines) achieved very high recall at report- and patient-levels over unseen data: 95% and 100% respectively. Precision at report-level over held-out data was 71%; however, most of the associated false-positive reports (53%) belonged to patients who had a previous positive report appropriately flagged by the classifier, reducing negative impact in practice. Conclusions: Our machine learning application holds the potential for developing systematic IFD surveillance systems for hospital populations.
Original languageEnglish
Pages (from-to)251-260
Number of pages10
JournalJournal of Biomedical Informatics
Volume53
DOIs
Publication statusPublished - Feb 2015

Fingerprint

Mycoses
Tomography
Learning systems
Classifiers
Radiology
Knowledge based systems
Tertiary Care Centers
Health Care Costs
Hybrid systems
Communicable Diseases
Hospitalization
Support vector machines
Language
Economics
Exercise
Health
Personnel
Lung

Cite this

Martinez, David ; Ananda-Rajah, Michelle ; Slavin, Monica ; Thursky, Karin ; Cavedon, Lawrence. / Automatic detection of patients with invasive fungal disease from free-text computed tomography (CT) scans. In: Journal of Biomedical Informatics. 2015 ; Vol. 53. pp. 251-260.
@article{c74366fb72ee405e91799d3f307174df,
title = "Automatic detection of patients with invasive fungal disease from free-text computed tomography (CT) scans",
abstract = "Background: Invasive fungal diseases (IFDs) are associated with considerable health and economic costs. Surveillance of the more diagnostically challenging invasive fungal diseases, specifically of the sino-pulmonary system, is not feasible for many hospitals because case finding is a costly and labour intensive exercise. We developed text classifiers for detecting such IFDs from free-text radiology (CT) reports, using machine-learning techniques. Method: We obtained free-text reports of CT scans performed over a specific hospitalisation period (2003-2011), for 264 IFD and 289 control patients from three tertiary hospitals. We analysed IFD evidence at patient, report, and sentence levels. Three infectious disease experts annotated the reports of 73 IFD-positive patients for language suggestive of IFD at sentence level, and graded the sentences as to whether they suggested or excluded the presence of IFD. Reliable agreement between annotators was obtained and this was used as training data for our classifiers. We tested a variety of Machine Learning (ML), rule based, and hybrid systems, with feature types including bags of words, bags of phrases, and bags of concepts, as well as report-level structured features. Evaluation was carried out over a robust framework with separate Development and Held-Out datasets. Results: The best systems (using Support Vector Machines) achieved very high recall at report- and patient-levels over unseen data: 95{\%} and 100{\%} respectively. Precision at report-level over held-out data was 71{\%}; however, most of the associated false-positive reports (53{\%}) belonged to patients who had a previous positive report appropriately flagged by the classifier, reducing negative impact in practice. Conclusions: Our machine learning application holds the potential for developing systematic IFD surveillance systems for hospital populations.",
keywords = "Aspergillosis, Data mining, Invasive fungal disease, Natural language processing, Surveillance",
author = "David Martinez and Michelle Ananda-Rajah and Monica Slavin and Karin Thursky and Lawrence Cavedon",
year = "2015",
month = "2",
doi = "10.1016/j.jbi.2014.11.009",
language = "English",
volume = "53",
pages = "251--260",
journal = "Computers and Biomedical Research",
issn = "0010-4809",
publisher = "Academic Press Inc.",

}

Automatic detection of patients with invasive fungal disease from free-text computed tomography (CT) scans. / Martinez, David; Ananda-Rajah, Michelle; Slavin, Monica; Thursky, Karin; Cavedon, Lawrence.

In: Journal of Biomedical Informatics, Vol. 53, 02.2015, p. 251-260.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Automatic detection of patients with invasive fungal disease from free-text computed tomography (CT) scans

AU - Martinez, David

AU - Ananda-Rajah, Michelle

AU - Slavin, Monica

AU - Thursky, Karin

AU - Cavedon, Lawrence

PY - 2015/2

Y1 - 2015/2

N2 - Background: Invasive fungal diseases (IFDs) are associated with considerable health and economic costs. Surveillance of the more diagnostically challenging invasive fungal diseases, specifically of the sino-pulmonary system, is not feasible for many hospitals because case finding is a costly and labour intensive exercise. We developed text classifiers for detecting such IFDs from free-text radiology (CT) reports, using machine-learning techniques. Method: We obtained free-text reports of CT scans performed over a specific hospitalisation period (2003-2011), for 264 IFD and 289 control patients from three tertiary hospitals. We analysed IFD evidence at patient, report, and sentence levels. Three infectious disease experts annotated the reports of 73 IFD-positive patients for language suggestive of IFD at sentence level, and graded the sentences as to whether they suggested or excluded the presence of IFD. Reliable agreement between annotators was obtained and this was used as training data for our classifiers. We tested a variety of Machine Learning (ML), rule based, and hybrid systems, with feature types including bags of words, bags of phrases, and bags of concepts, as well as report-level structured features. Evaluation was carried out over a robust framework with separate Development and Held-Out datasets. Results: The best systems (using Support Vector Machines) achieved very high recall at report- and patient-levels over unseen data: 95% and 100% respectively. Precision at report-level over held-out data was 71%; however, most of the associated false-positive reports (53%) belonged to patients who had a previous positive report appropriately flagged by the classifier, reducing negative impact in practice. Conclusions: Our machine learning application holds the potential for developing systematic IFD surveillance systems for hospital populations.

AB - Background: Invasive fungal diseases (IFDs) are associated with considerable health and economic costs. Surveillance of the more diagnostically challenging invasive fungal diseases, specifically of the sino-pulmonary system, is not feasible for many hospitals because case finding is a costly and labour intensive exercise. We developed text classifiers for detecting such IFDs from free-text radiology (CT) reports, using machine-learning techniques. Method: We obtained free-text reports of CT scans performed over a specific hospitalisation period (2003-2011), for 264 IFD and 289 control patients from three tertiary hospitals. We analysed IFD evidence at patient, report, and sentence levels. Three infectious disease experts annotated the reports of 73 IFD-positive patients for language suggestive of IFD at sentence level, and graded the sentences as to whether they suggested or excluded the presence of IFD. Reliable agreement between annotators was obtained and this was used as training data for our classifiers. We tested a variety of Machine Learning (ML), rule based, and hybrid systems, with feature types including bags of words, bags of phrases, and bags of concepts, as well as report-level structured features. Evaluation was carried out over a robust framework with separate Development and Held-Out datasets. Results: The best systems (using Support Vector Machines) achieved very high recall at report- and patient-levels over unseen data: 95% and 100% respectively. Precision at report-level over held-out data was 71%; however, most of the associated false-positive reports (53%) belonged to patients who had a previous positive report appropriately flagged by the classifier, reducing negative impact in practice. Conclusions: Our machine learning application holds the potential for developing systematic IFD surveillance systems for hospital populations.

KW - Aspergillosis

KW - Data mining

KW - Invasive fungal disease

KW - Natural language processing

KW - Surveillance

UR - http://www.scopus.com/inward/record.url?scp=84924444634&partnerID=8YFLogxK

UR - http://www.mendeley.com/research/automatic-detection-patients-invasive-fungal-disease-freetext-computed-tomography-ct-scans-1

U2 - 10.1016/j.jbi.2014.11.009

DO - 10.1016/j.jbi.2014.11.009

M3 - Article

VL - 53

SP - 251

EP - 260

JO - Computers and Biomedical Research

JF - Computers and Biomedical Research

SN - 0010-4809

ER -