Infection status outcome, machine learning method and virus type interact to affect the optimised prediction of hepatitis virus immunoassay results from routine pathology laboratory assays in unbalanced data

Alice RICHARDSON, Brett A. Lidbury

    Research output: Contribution to journalArticle

    7 Citations (Scopus)

    Abstract

    BACKGROUND: Advanced data mining techniques such as decision trees have been successfully used to predict a variety of outcomes in complex medical environments. Furthermore, previous research has shown that combining the results of a set of individually trained trees into an ensemble-based classifier can improve overall classification accuracy. This paper investigates the effect of data pre-processing, the use of ensembles constructed by bagging, and a simple majority vote to combine classification predictions from routine pathology laboratory data, particularly to overcome a large imbalance of negative Hepatitis B virus (HBV) and Hepatitis C virus (HCV) cases versus HBV or HCV immunoassay positive cases. These methods were illustrated using a never before analysed data set from ACT Pathology (Canberra, Australia) relating to HBV and HCV patients. RESULTS: It was easier to predict immunoassay positive cases than negative cases of HBV or HCV. While applying an ensemble-based approach rather than a single classifier had a small positive effect on the accuracy rate, this also varied depending on the virus under analysis. Finally, scaling data before prediction also has a small positive effect on the accuracy rate for this dataset. A graphical analysis of the distribution of accuracy rates across ensembles supports these findings. CONCLUSIONS: Laboratories looking to include machine learning as part of their decision support processes need to be aware that the infection outcome, the machine learning method used and the virus type interact to affect the enhanced laboratory diagnosis of hepatitis virus infection, as determined by primary immunoassay data in concert with multiple routine pathology laboratory variables. This awareness will lead to the informed use of existing machine learning methods, thus improving the quality of laboratory diagnosis via informatics analyses.
    Original languageEnglish
    Pages (from-to)206-213
    Number of pages8
    JournalBMC Bioinformatics
    Volume14
    Issue number1
    DOIs
    Publication statusPublished - 2013

    Fingerprint

    Unbalanced Data
    Immunoassay
    Hepatitis Viruses
    Pathology
    Viruses
    Hepatitis B virus
    Hepacivirus
    Virus
    Infection
    Learning systems
    Assays
    Machine Learning
    Clinical Laboratory Techniques
    Prediction
    Decision Trees
    Informatics
    Data Mining
    Ensemble
    Virus Diseases
    Classifiers

    Cite this

    @article{30937bd535ae4317aa86961b213e2887,
    title = "Infection status outcome, machine learning method and virus type interact to affect the optimised prediction of hepatitis virus immunoassay results from routine pathology laboratory assays in unbalanced data",
    abstract = "BACKGROUND: Advanced data mining techniques such as decision trees have been successfully used to predict a variety of outcomes in complex medical environments. Furthermore, previous research has shown that combining the results of a set of individually trained trees into an ensemble-based classifier can improve overall classification accuracy. This paper investigates the effect of data pre-processing, the use of ensembles constructed by bagging, and a simple majority vote to combine classification predictions from routine pathology laboratory data, particularly to overcome a large imbalance of negative Hepatitis B virus (HBV) and Hepatitis C virus (HCV) cases versus HBV or HCV immunoassay positive cases. These methods were illustrated using a never before analysed data set from ACT Pathology (Canberra, Australia) relating to HBV and HCV patients. RESULTS: It was easier to predict immunoassay positive cases than negative cases of HBV or HCV. While applying an ensemble-based approach rather than a single classifier had a small positive effect on the accuracy rate, this also varied depending on the virus under analysis. Finally, scaling data before prediction also has a small positive effect on the accuracy rate for this dataset. A graphical analysis of the distribution of accuracy rates across ensembles supports these findings. CONCLUSIONS: Laboratories looking to include machine learning as part of their decision support processes need to be aware that the infection outcome, the machine learning method used and the virus type interact to affect the enhanced laboratory diagnosis of hepatitis virus infection, as determined by primary immunoassay data in concert with multiple routine pathology laboratory variables. This awareness will lead to the informed use of existing machine learning methods, thus improving the quality of laboratory diagnosis via informatics analyses.",
    keywords = "decision, tree, analysis of variance, clinical diagnosis",
    author = "Alice RICHARDSON and Lidbury, {Brett A.}",
    year = "2013",
    doi = "10.1186/1471-2105-14-206",
    language = "English",
    volume = "14",
    pages = "206--213",
    journal = "BMC Bioinformatics",
    issn = "1471-2105",
    publisher = "BioMed Central",
    number = "1",

    }

    TY - JOUR

    T1 - Infection status outcome, machine learning method and virus type interact to affect the optimised prediction of hepatitis virus immunoassay results from routine pathology laboratory assays in unbalanced data

    AU - RICHARDSON, Alice

    AU - Lidbury, Brett A.

    PY - 2013

    Y1 - 2013

    N2 - BACKGROUND: Advanced data mining techniques such as decision trees have been successfully used to predict a variety of outcomes in complex medical environments. Furthermore, previous research has shown that combining the results of a set of individually trained trees into an ensemble-based classifier can improve overall classification accuracy. This paper investigates the effect of data pre-processing, the use of ensembles constructed by bagging, and a simple majority vote to combine classification predictions from routine pathology laboratory data, particularly to overcome a large imbalance of negative Hepatitis B virus (HBV) and Hepatitis C virus (HCV) cases versus HBV or HCV immunoassay positive cases. These methods were illustrated using a never before analysed data set from ACT Pathology (Canberra, Australia) relating to HBV and HCV patients. RESULTS: It was easier to predict immunoassay positive cases than negative cases of HBV or HCV. While applying an ensemble-based approach rather than a single classifier had a small positive effect on the accuracy rate, this also varied depending on the virus under analysis. Finally, scaling data before prediction also has a small positive effect on the accuracy rate for this dataset. A graphical analysis of the distribution of accuracy rates across ensembles supports these findings. CONCLUSIONS: Laboratories looking to include machine learning as part of their decision support processes need to be aware that the infection outcome, the machine learning method used and the virus type interact to affect the enhanced laboratory diagnosis of hepatitis virus infection, as determined by primary immunoassay data in concert with multiple routine pathology laboratory variables. This awareness will lead to the informed use of existing machine learning methods, thus improving the quality of laboratory diagnosis via informatics analyses.

    AB - BACKGROUND: Advanced data mining techniques such as decision trees have been successfully used to predict a variety of outcomes in complex medical environments. Furthermore, previous research has shown that combining the results of a set of individually trained trees into an ensemble-based classifier can improve overall classification accuracy. This paper investigates the effect of data pre-processing, the use of ensembles constructed by bagging, and a simple majority vote to combine classification predictions from routine pathology laboratory data, particularly to overcome a large imbalance of negative Hepatitis B virus (HBV) and Hepatitis C virus (HCV) cases versus HBV or HCV immunoassay positive cases. These methods were illustrated using a never before analysed data set from ACT Pathology (Canberra, Australia) relating to HBV and HCV patients. RESULTS: It was easier to predict immunoassay positive cases than negative cases of HBV or HCV. While applying an ensemble-based approach rather than a single classifier had a small positive effect on the accuracy rate, this also varied depending on the virus under analysis. Finally, scaling data before prediction also has a small positive effect on the accuracy rate for this dataset. A graphical analysis of the distribution of accuracy rates across ensembles supports these findings. CONCLUSIONS: Laboratories looking to include machine learning as part of their decision support processes need to be aware that the infection outcome, the machine learning method used and the virus type interact to affect the enhanced laboratory diagnosis of hepatitis virus infection, as determined by primary immunoassay data in concert with multiple routine pathology laboratory variables. This awareness will lead to the informed use of existing machine learning methods, thus improving the quality of laboratory diagnosis via informatics analyses.

    KW - decision

    KW - tree

    KW - analysis of variance

    KW - clinical diagnosis

    U2 - 10.1186/1471-2105-14-206

    DO - 10.1186/1471-2105-14-206

    M3 - Article

    VL - 14

    SP - 206

    EP - 213

    JO - BMC Bioinformatics

    JF - BMC Bioinformatics

    SN - 1471-2105

    IS - 1

    ER -