Predicting the presence of hepatitis B virus surface antigen in Chinese patients by pathology data mining

Guifang Shang, Alice RICHARDSON, Michelle GAHAN, Simon Easteal, Stephen Ohms, Brett A. Lidbury

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Hepatitis B virus (HBV) is a pathogen of worldwide health significance, associated with liver disease. A vaccine is available, yet HBV prevalence remains a concern, particularly in developing countries. Pathology laboratories have a primary role in the diagnosis and monitoring of HBV infection, through hepatitis B surface antigen (HBsAg) immunoassay and associated tests. Analysis of HBsAg immunoassay and associated pathology data from 821 Chinese patients applied 10-fold cross-validation to establish classification decision trees (CDTs), with CDT results used subsequently to develop a logistic regression model. The robustness of logistic regression model was confirmed by the Hosmer-Lemeshow test, Pseudo-R2 and an area under receiver operating characteristic curve (AUROC) result that showed the logistic regression model was capable of accurately discriminating the HBsAg positive from HBsAg negative patients at 95% accuracy. Overall CDT sensitivity and specificity was 94.7% (+/- 5.0%) and 89.5% (+/- 5.7%), respectively, close to the sensitivity and specificity of the immunoassay, providing an alternative to predict HBsAg status. Both the CDT and logistic regression modeling demonstrated the importance of the routine pathology variables alanine aminotransferase (ALT), serum albumin (ALB), and alkaline phosphatase (ALP) to accurately predict HBsAg status in a Chinese patient cohort. The study demonstrates that CDTs and a linked logistic regression model applied to routine pathology data were an effective supplement to HBsAg immunoassay, and a possible replacement method where immunoassays are not requested or not easily available for the laboratory diagnosis of HBV infection
Original languageEnglish
Pages (from-to)1334-1339
Number of pages6
JournalJournal of Medical Virology
Volume85
Issue number8
DOIs
Publication statusPublished - 2013

Fingerprint

Data Mining
Hepatitis B Surface Antigens
Hepatitis B virus
Logistic Models
Pathology
Decision Trees
Immunoassay
Virus Diseases
Sensitivity and Specificity
Clinical Laboratory Techniques
Alanine Transaminase
Serum Albumin
ROC Curve
Developing Countries
Alkaline Phosphatase
Liver Diseases
Vaccines

Cite this

Shang, Guifang ; RICHARDSON, Alice ; GAHAN, Michelle ; Easteal, Simon ; Ohms, Stephen ; Lidbury, Brett A. / Predicting the presence of hepatitis B virus surface antigen in Chinese patients by pathology data mining. In: Journal of Medical Virology. 2013 ; Vol. 85, No. 8. pp. 1334-1339.
@article{b8cbf612ead74848bf8d16ef226e44ec,
title = "Predicting the presence of hepatitis B virus surface antigen in Chinese patients by pathology data mining",
abstract = "Hepatitis B virus (HBV) is a pathogen of worldwide health significance, associated with liver disease. A vaccine is available, yet HBV prevalence remains a concern, particularly in developing countries. Pathology laboratories have a primary role in the diagnosis and monitoring of HBV infection, through hepatitis B surface antigen (HBsAg) immunoassay and associated tests. Analysis of HBsAg immunoassay and associated pathology data from 821 Chinese patients applied 10-fold cross-validation to establish classification decision trees (CDTs), with CDT results used subsequently to develop a logistic regression model. The robustness of logistic regression model was confirmed by the Hosmer-Lemeshow test, Pseudo-R2 and an area under receiver operating characteristic curve (AUROC) result that showed the logistic regression model was capable of accurately discriminating the HBsAg positive from HBsAg negative patients at 95{\%} accuracy. Overall CDT sensitivity and specificity was 94.7{\%} (+/- 5.0{\%}) and 89.5{\%} (+/- 5.7{\%}), respectively, close to the sensitivity and specificity of the immunoassay, providing an alternative to predict HBsAg status. Both the CDT and logistic regression modeling demonstrated the importance of the routine pathology variables alanine aminotransferase (ALT), serum albumin (ALB), and alkaline phosphatase (ALP) to accurately predict HBsAg status in a Chinese patient cohort. The study demonstrates that CDTs and a linked logistic regression model applied to routine pathology data were an effective supplement to HBsAg immunoassay, and a possible replacement method where immunoassays are not requested or not easily available for the laboratory diagnosis of HBV infection",
keywords = "Decision tree, Hepatitis B virus, Logistic regression, Machine learning",
author = "Guifang Shang and Alice RICHARDSON and Michelle GAHAN and Simon Easteal and Stephen Ohms and Lidbury, {Brett A.}",
year = "2013",
doi = "10.1002/jmv.23609",
language = "English",
volume = "85",
pages = "1334--1339",
journal = "Journal of Medical Virology",
issn = "0146-6615",
publisher = "Wiley-Liss Inc.",
number = "8",

}

Shang, G, RICHARDSON, A, GAHAN, M, Easteal, S, Ohms, S & Lidbury, BA 2013, 'Predicting the presence of hepatitis B virus surface antigen in Chinese patients by pathology data mining', Journal of Medical Virology, vol. 85, no. 8, pp. 1334-1339. https://doi.org/10.1002/jmv.23609

Predicting the presence of hepatitis B virus surface antigen in Chinese patients by pathology data mining. / Shang, Guifang; RICHARDSON, Alice; GAHAN, Michelle; Easteal, Simon; Ohms, Stephen; Lidbury, Brett A.

In: Journal of Medical Virology, Vol. 85, No. 8, 2013, p. 1334-1339.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Predicting the presence of hepatitis B virus surface antigen in Chinese patients by pathology data mining

AU - Shang, Guifang

AU - RICHARDSON, Alice

AU - GAHAN, Michelle

AU - Easteal, Simon

AU - Ohms, Stephen

AU - Lidbury, Brett A.

PY - 2013

Y1 - 2013

N2 - Hepatitis B virus (HBV) is a pathogen of worldwide health significance, associated with liver disease. A vaccine is available, yet HBV prevalence remains a concern, particularly in developing countries. Pathology laboratories have a primary role in the diagnosis and monitoring of HBV infection, through hepatitis B surface antigen (HBsAg) immunoassay and associated tests. Analysis of HBsAg immunoassay and associated pathology data from 821 Chinese patients applied 10-fold cross-validation to establish classification decision trees (CDTs), with CDT results used subsequently to develop a logistic regression model. The robustness of logistic regression model was confirmed by the Hosmer-Lemeshow test, Pseudo-R2 and an area under receiver operating characteristic curve (AUROC) result that showed the logistic regression model was capable of accurately discriminating the HBsAg positive from HBsAg negative patients at 95% accuracy. Overall CDT sensitivity and specificity was 94.7% (+/- 5.0%) and 89.5% (+/- 5.7%), respectively, close to the sensitivity and specificity of the immunoassay, providing an alternative to predict HBsAg status. Both the CDT and logistic regression modeling demonstrated the importance of the routine pathology variables alanine aminotransferase (ALT), serum albumin (ALB), and alkaline phosphatase (ALP) to accurately predict HBsAg status in a Chinese patient cohort. The study demonstrates that CDTs and a linked logistic regression model applied to routine pathology data were an effective supplement to HBsAg immunoassay, and a possible replacement method where immunoassays are not requested or not easily available for the laboratory diagnosis of HBV infection

AB - Hepatitis B virus (HBV) is a pathogen of worldwide health significance, associated with liver disease. A vaccine is available, yet HBV prevalence remains a concern, particularly in developing countries. Pathology laboratories have a primary role in the diagnosis and monitoring of HBV infection, through hepatitis B surface antigen (HBsAg) immunoassay and associated tests. Analysis of HBsAg immunoassay and associated pathology data from 821 Chinese patients applied 10-fold cross-validation to establish classification decision trees (CDTs), with CDT results used subsequently to develop a logistic regression model. The robustness of logistic regression model was confirmed by the Hosmer-Lemeshow test, Pseudo-R2 and an area under receiver operating characteristic curve (AUROC) result that showed the logistic regression model was capable of accurately discriminating the HBsAg positive from HBsAg negative patients at 95% accuracy. Overall CDT sensitivity and specificity was 94.7% (+/- 5.0%) and 89.5% (+/- 5.7%), respectively, close to the sensitivity and specificity of the immunoassay, providing an alternative to predict HBsAg status. Both the CDT and logistic regression modeling demonstrated the importance of the routine pathology variables alanine aminotransferase (ALT), serum albumin (ALB), and alkaline phosphatase (ALP) to accurately predict HBsAg status in a Chinese patient cohort. The study demonstrates that CDTs and a linked logistic regression model applied to routine pathology data were an effective supplement to HBsAg immunoassay, and a possible replacement method where immunoassays are not requested or not easily available for the laboratory diagnosis of HBV infection

KW - Decision tree

KW - Hepatitis B virus

KW - Logistic regression

KW - Machine learning

U2 - 10.1002/jmv.23609

DO - 10.1002/jmv.23609

M3 - Article

VL - 85

SP - 1334

EP - 1339

JO - Journal of Medical Virology

JF - Journal of Medical Virology

SN - 0146-6615

IS - 8

ER -