Prediction of biogeographical ancestry from genotype: a comparison of classifiers

Elaine CHEUNG, Michelle GAHAN, Dennis MCNEVIN

Research output: Contribution to journalArticlepeer-review

16 Citations (Scopus)


DNA can provide forensic intelligence regarding a donor’s biogeographical ancestry (BGA) and other externally visible characteristics (EVCs). A number of algorithms have been proposed to assign individual human genotypes to a BGA using ancestry informative marker (AIM) panels. This study compares the BGA assignment accuracy of the population clustering program STRUCTURE and three generic classification approaches including a Bayesian algorithm, genetic distance, and multinomial logistic regression (MLR). A selection of 142 ancestry informative single nucleotide polymorphisms (SNPs) were chosen from existing marker panels (SNPforID 34-plex, Eurasiaplex, Seldin, and Kidd’s AIM panels) to assess BGA classification at the continental level for Africans, Europeans, East Asians, and Amerindians. A training set of 1093 individuals with self-declared BGA from the 1000 Genomes phase 1 database was used by each classifier to predict BGA in a test set of 516 individuals from the HGDP-CEPH (Stanford) cell line panel. Tests were repeated with 0, 10, 50, 70, and 90% of the genotypes missing. Comparison of the area under the receiver operating characteristic curves (AUROCs) showed high accuracy in STRUCTURE and the generic Bayesian approach. The latter algorithm offers a computationally simpler alternative to STRUCTURE with little loss in accuracy and is suitable for phenotype prediction while STRUCTURE is not.
Original languageEnglish
Pages (from-to)901-912
Number of pages12
JournalInternational Journal of Legal Medicine
Issue number4
Publication statusPublished - Jul 2017


Dive into the research topics of 'Prediction of biogeographical ancestry from genotype: a comparison of classifiers'. Together they form a unique fingerprint.

Cite this