Prediction of biogeographical ancestry from genotype: a comparison of classifiers

Elaine CHEUNG, Michelle GAHAN, Dennis MCNEVIN

    Research output: Contribution to journalArticle

    5 Citations (Scopus)

    Abstract

    DNA can provide forensic intelligence regarding a donor’s biogeographical ancestry (BGA) and other externally visible characteristics (EVCs). A number of algorithms have been proposed to assign individual human genotypes to a BGA using ancestry informative marker (AIM) panels. This study compares the BGA assignment accuracy of the population clustering program STRUCTURE and three generic classification approaches including a Bayesian algorithm, genetic distance, and multinomial logistic regression (MLR). A selection of 142 ancestry informative single nucleotide polymorphisms (SNPs) were chosen from existing marker panels (SNPforID 34-plex, Eurasiaplex, Seldin, and Kidd’s AIM panels) to assess BGA classification at the continental level for Africans, Europeans, East Asians, and Amerindians. A training set of 1093 individuals with self-declared BGA from the 1000 Genomes phase 1 database was used by each classifier to predict BGA in a test set of 516 individuals from the HGDP-CEPH (Stanford) cell line panel. Tests were repeated with 0, 10, 50, 70, and 90% of the genotypes missing. Comparison of the area under the receiver operating characteristic curves (AUROCs) showed high accuracy in STRUCTURE and the generic Bayesian approach. The latter algorithm offers a computationally simpler alternative to STRUCTURE with little loss in accuracy and is suitable for phenotype prediction while STRUCTURE is not.
    Original languageEnglish
    Pages (from-to)1-12
    Number of pages12
    JournalInternational Journal of Legal Medicine
    DOIs
    Publication statusPublished - 2017

    Fingerprint

    Genotype
    Bayes Theorem
    Population Control
    Intelligence
    ROC Curve
    Single Nucleotide Polymorphism
    Cluster Analysis
    Logistic Models
    Tissue Donors
    Genome
    Databases
    Phenotype
    Cell Line
    DNA

    Cite this

    @article{c9a043b6ddfc44b1b9432faf20459875,
    title = "Prediction of biogeographical ancestry from genotype: a comparison of classifiers",
    abstract = "DNA can provide forensic intelligence regarding a donor’s biogeographical ancestry (BGA) and other externally visible characteristics (EVCs). A number of algorithms have been proposed to assign individual human genotypes to a BGA using ancestry informative marker (AIM) panels. This study compares the BGA assignment accuracy of the population clustering program STRUCTURE and three generic classification approaches including a Bayesian algorithm, genetic distance, and multinomial logistic regression (MLR). A selection of 142 ancestry informative single nucleotide polymorphisms (SNPs) were chosen from existing marker panels (SNPforID 34-plex, Eurasiaplex, Seldin, and Kidd’s AIM panels) to assess BGA classification at the continental level for Africans, Europeans, East Asians, and Amerindians. A training set of 1093 individuals with self-declared BGA from the 1000 Genomes phase 1 database was used by each classifier to predict BGA in a test set of 516 individuals from the HGDP-CEPH (Stanford) cell line panel. Tests were repeated with 0, 10, 50, 70, and 90{\%} of the genotypes missing. Comparison of the area under the receiver operating characteristic curves (AUROCs) showed high accuracy in STRUCTURE and the generic Bayesian approach. The latter algorithm offers a computationally simpler alternative to STRUCTURE with little loss in accuracy and is suitable for phenotype prediction while STRUCTURE is not.",
    keywords = "Biogeographical ancestry (BGA), Phenotype prediction, Structure, Bayesian, genetic distance, Multinomial logistic regression",
    author = "Elaine CHEUNG and Michelle GAHAN and Dennis MCNEVIN",
    year = "2017",
    doi = "10.1007/s00414-016-1504-3",
    language = "English",
    pages = "1--12",
    journal = "Zeitschrift fur Rechtsmedizin. Journal of legal medicine",
    issn = "0937-9827",
    publisher = "Springer Verlag",

    }

    Prediction of biogeographical ancestry from genotype: a comparison of classifiers. / CHEUNG, Elaine; GAHAN, Michelle; MCNEVIN, Dennis.

    In: International Journal of Legal Medicine, 2017, p. 1-12.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - Prediction of biogeographical ancestry from genotype: a comparison of classifiers

    AU - CHEUNG, Elaine

    AU - GAHAN, Michelle

    AU - MCNEVIN, Dennis

    PY - 2017

    Y1 - 2017

    N2 - DNA can provide forensic intelligence regarding a donor’s biogeographical ancestry (BGA) and other externally visible characteristics (EVCs). A number of algorithms have been proposed to assign individual human genotypes to a BGA using ancestry informative marker (AIM) panels. This study compares the BGA assignment accuracy of the population clustering program STRUCTURE and three generic classification approaches including a Bayesian algorithm, genetic distance, and multinomial logistic regression (MLR). A selection of 142 ancestry informative single nucleotide polymorphisms (SNPs) were chosen from existing marker panels (SNPforID 34-plex, Eurasiaplex, Seldin, and Kidd’s AIM panels) to assess BGA classification at the continental level for Africans, Europeans, East Asians, and Amerindians. A training set of 1093 individuals with self-declared BGA from the 1000 Genomes phase 1 database was used by each classifier to predict BGA in a test set of 516 individuals from the HGDP-CEPH (Stanford) cell line panel. Tests were repeated with 0, 10, 50, 70, and 90% of the genotypes missing. Comparison of the area under the receiver operating characteristic curves (AUROCs) showed high accuracy in STRUCTURE and the generic Bayesian approach. The latter algorithm offers a computationally simpler alternative to STRUCTURE with little loss in accuracy and is suitable for phenotype prediction while STRUCTURE is not.

    AB - DNA can provide forensic intelligence regarding a donor’s biogeographical ancestry (BGA) and other externally visible characteristics (EVCs). A number of algorithms have been proposed to assign individual human genotypes to a BGA using ancestry informative marker (AIM) panels. This study compares the BGA assignment accuracy of the population clustering program STRUCTURE and three generic classification approaches including a Bayesian algorithm, genetic distance, and multinomial logistic regression (MLR). A selection of 142 ancestry informative single nucleotide polymorphisms (SNPs) were chosen from existing marker panels (SNPforID 34-plex, Eurasiaplex, Seldin, and Kidd’s AIM panels) to assess BGA classification at the continental level for Africans, Europeans, East Asians, and Amerindians. A training set of 1093 individuals with self-declared BGA from the 1000 Genomes phase 1 database was used by each classifier to predict BGA in a test set of 516 individuals from the HGDP-CEPH (Stanford) cell line panel. Tests were repeated with 0, 10, 50, 70, and 90% of the genotypes missing. Comparison of the area under the receiver operating characteristic curves (AUROCs) showed high accuracy in STRUCTURE and the generic Bayesian approach. The latter algorithm offers a computationally simpler alternative to STRUCTURE with little loss in accuracy and is suitable for phenotype prediction while STRUCTURE is not.

    KW - Biogeographical ancestry (BGA)

    KW - Phenotype prediction

    KW - Structure

    KW - Bayesian

    KW - genetic distance

    KW - Multinomial logistic regression

    U2 - 10.1007/s00414-016-1504-3

    DO - 10.1007/s00414-016-1504-3

    M3 - Article

    SP - 1

    EP - 12

    JO - Zeitschrift fur Rechtsmedizin. Journal of legal medicine

    JF - Zeitschrift fur Rechtsmedizin. Journal of legal medicine

    SN - 0937-9827

    ER -