Prediction of biogeographical ancestry from genotype: a comparison of classifiers

Elaine CHEUNG, Michelle GAHAN, Dennis MCNEVIN

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

DNA can provide forensic intelligence regarding a donor’s biogeographical ancestry (BGA) and other externally visible characteristics (EVCs). A number of algorithms have been proposed to assign individual human genotypes to a BGA using ancestry informative marker (AIM) panels. This study compares the BGA assignment accuracy of the population clustering program STRUCTURE and three generic classification approaches including a Bayesian algorithm, genetic distance, and multinomial logistic regression (MLR). A selection of 142 ancestry informative single nucleotide polymorphisms (SNPs) were chosen from existing marker panels (SNPforID 34-plex, Eurasiaplex, Seldin, and Kidd’s AIM panels) to assess BGA classification at the continental level for Africans, Europeans, East Asians, and Amerindians. A training set of 1093 individuals with self-declared BGA from the 1000 Genomes phase 1 database was used by each classifier to predict BGA in a test set of 516 individuals from the HGDP-CEPH (Stanford) cell line panel. Tests were repeated with 0, 10, 50, 70, and 90% of the genotypes missing. Comparison of the area under the receiver operating characteristic curves (AUROCs) showed high accuracy in STRUCTURE and the generic Bayesian approach. The latter algorithm offers a computationally simpler alternative to STRUCTURE with little loss in accuracy and is suitable for phenotype prediction while STRUCTURE is not.
Original languageEnglish
Pages (from-to)901-912
Number of pages12
JournalInternational Journal of Legal Medicine
Volume131
Issue number4
DOIs
Publication statusPublished - Jul 2017

Fingerprint

Genotype
Bayes Theorem
Population Control
Intelligence
ROC Curve
Single Nucleotide Polymorphism
Cluster Analysis
Logistic Models
Tissue Donors
Genome
Databases
Phenotype
Cell Line
DNA

Cite this

CHEUNG, Elaine ; GAHAN, Michelle ; MCNEVIN, Dennis. / Prediction of biogeographical ancestry from genotype: a comparison of classifiers. In: International Journal of Legal Medicine. 2017 ; Vol. 131, No. 4. pp. 901-912.
@article{c9a043b6ddfc44b1b9432faf20459875,
title = "Prediction of biogeographical ancestry from genotype: a comparison of classifiers",
abstract = "DNA can provide forensic intelligence regarding a donor’s biogeographical ancestry (BGA) and other externally visible characteristics (EVCs). A number of algorithms have been proposed to assign individual human genotypes to a BGA using ancestry informative marker (AIM) panels. This study compares the BGA assignment accuracy of the population clustering program STRUCTURE and three generic classification approaches including a Bayesian algorithm, genetic distance, and multinomial logistic regression (MLR). A selection of 142 ancestry informative single nucleotide polymorphisms (SNPs) were chosen from existing marker panels (SNPforID 34-plex, Eurasiaplex, Seldin, and Kidd’s AIM panels) to assess BGA classification at the continental level for Africans, Europeans, East Asians, and Amerindians. A training set of 1093 individuals with self-declared BGA from the 1000 Genomes phase 1 database was used by each classifier to predict BGA in a test set of 516 individuals from the HGDP-CEPH (Stanford) cell line panel. Tests were repeated with 0, 10, 50, 70, and 90{\%} of the genotypes missing. Comparison of the area under the receiver operating characteristic curves (AUROCs) showed high accuracy in STRUCTURE and the generic Bayesian approach. The latter algorithm offers a computationally simpler alternative to STRUCTURE with little loss in accuracy and is suitable for phenotype prediction while STRUCTURE is not.",
keywords = "Biogeographical ancestry (BGA), Phenotype prediction, Structure, Bayesian, genetic distance, Multinomial logistic regression",
author = "Elaine CHEUNG and Michelle GAHAN and Dennis MCNEVIN",
year = "2017",
month = "7",
doi = "10.1007/s00414-016-1504-3",
language = "English",
volume = "131",
pages = "901--912",
journal = "Zeitschrift fur Rechtsmedizin. Journal of legal medicine",
issn = "0937-9827",
publisher = "Springer Verlag",
number = "4",

}

Prediction of biogeographical ancestry from genotype: a comparison of classifiers. / CHEUNG, Elaine; GAHAN, Michelle; MCNEVIN, Dennis.

In: International Journal of Legal Medicine, Vol. 131, No. 4, 07.2017, p. 901-912.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Prediction of biogeographical ancestry from genotype: a comparison of classifiers

AU - CHEUNG, Elaine

AU - GAHAN, Michelle

AU - MCNEVIN, Dennis

PY - 2017/7

Y1 - 2017/7

N2 - DNA can provide forensic intelligence regarding a donor’s biogeographical ancestry (BGA) and other externally visible characteristics (EVCs). A number of algorithms have been proposed to assign individual human genotypes to a BGA using ancestry informative marker (AIM) panels. This study compares the BGA assignment accuracy of the population clustering program STRUCTURE and three generic classification approaches including a Bayesian algorithm, genetic distance, and multinomial logistic regression (MLR). A selection of 142 ancestry informative single nucleotide polymorphisms (SNPs) were chosen from existing marker panels (SNPforID 34-plex, Eurasiaplex, Seldin, and Kidd’s AIM panels) to assess BGA classification at the continental level for Africans, Europeans, East Asians, and Amerindians. A training set of 1093 individuals with self-declared BGA from the 1000 Genomes phase 1 database was used by each classifier to predict BGA in a test set of 516 individuals from the HGDP-CEPH (Stanford) cell line panel. Tests were repeated with 0, 10, 50, 70, and 90% of the genotypes missing. Comparison of the area under the receiver operating characteristic curves (AUROCs) showed high accuracy in STRUCTURE and the generic Bayesian approach. The latter algorithm offers a computationally simpler alternative to STRUCTURE with little loss in accuracy and is suitable for phenotype prediction while STRUCTURE is not.

AB - DNA can provide forensic intelligence regarding a donor’s biogeographical ancestry (BGA) and other externally visible characteristics (EVCs). A number of algorithms have been proposed to assign individual human genotypes to a BGA using ancestry informative marker (AIM) panels. This study compares the BGA assignment accuracy of the population clustering program STRUCTURE and three generic classification approaches including a Bayesian algorithm, genetic distance, and multinomial logistic regression (MLR). A selection of 142 ancestry informative single nucleotide polymorphisms (SNPs) were chosen from existing marker panels (SNPforID 34-plex, Eurasiaplex, Seldin, and Kidd’s AIM panels) to assess BGA classification at the continental level for Africans, Europeans, East Asians, and Amerindians. A training set of 1093 individuals with self-declared BGA from the 1000 Genomes phase 1 database was used by each classifier to predict BGA in a test set of 516 individuals from the HGDP-CEPH (Stanford) cell line panel. Tests were repeated with 0, 10, 50, 70, and 90% of the genotypes missing. Comparison of the area under the receiver operating characteristic curves (AUROCs) showed high accuracy in STRUCTURE and the generic Bayesian approach. The latter algorithm offers a computationally simpler alternative to STRUCTURE with little loss in accuracy and is suitable for phenotype prediction while STRUCTURE is not.

KW - Biogeographical ancestry (BGA)

KW - Phenotype prediction

KW - Structure

KW - Bayesian

KW - genetic distance

KW - Multinomial logistic regression

UR - http://www.scopus.com/inward/record.url?scp=85006341344&partnerID=8YFLogxK

U2 - 10.1007/s00414-016-1504-3

DO - 10.1007/s00414-016-1504-3

M3 - Article

VL - 131

SP - 901

EP - 912

JO - Zeitschrift fur Rechtsmedizin. Journal of legal medicine

JF - Zeitschrift fur Rechtsmedizin. Journal of legal medicine

SN - 0937-9827

IS - 4

ER -