dartr: An r package to facilitate analysis of SNP data generated from reduced representation genome sequencing

    Research output: Contribution to journalArticle

    16 Citations (Scopus)

    Abstract

    Although vast technological advances have been made and genetic software packages are growing in number, it is not a trivial task to analyse SNP data. We announce a new r package, dartr, enabling the analysis of single nucleotide polymorphism data for population genomic and phylogenomic applications. dartr provides user-friendly functions for data quality control and marker selection, and permits rigorous evaluations of conformation to Hardy–Weinberg equilibrium, gametic-phase disequilibrium and neutrality. The package reports standard descriptive statistics, permits exploration of patterns in the data through principal components analysis and conducts standard F-statistics, as well as basic phylogenetic analyses, population assignment, isolation by distance and exports data to a variety of commonly used downstream applications (e.g., newhybrids, faststructure and phylogeny applications) outside of the r environment. The package serves two main purposes: first, a user-friendly approach to lower the hurdle to analyse such data—therefore, the package comes with a detailed tutorial targeted to the r beginner to allow data analysis without requiring deep knowledge of r. Second, we use a single, well-established format—genlight from the adegenet package—as input for all our functions to avoid data reformatting. By strictly using the genlight format, we hope to facilitate this format as the de facto standard of future software developments and hence reduce the format jungle of genetic data sets. The dartr package is available via the r CRAN network and GitHub.

    LanguageEnglish
    Pages691-699
    Number of pages9
    JournalMolecular Ecology Resources
    Volume18
    Issue number3
    DOIs
    Publication statusPublished - 1 May 2018

    Fingerprint

    Single Nucleotide Polymorphism
    data analysis
    statistics
    genome
    Genome
    phylogeny
    single nucleotide polymorphism
    quality control
    principal component analysis
    Software
    Metagenomics
    Phylogeny
    Principal Component Analysis
    Quality Control
    phase equilibrium
    data quality
    disequilibrium
    analysis
    genomics
    polymorphism

    Cite this

    @article{40939760e5364cf5b3fa8a47ea6c567a,
    title = "dartr: An r package to facilitate analysis of SNP data generated from reduced representation genome sequencing",
    abstract = "Although vast technological advances have been made and genetic software packages are growing in number, it is not a trivial task to analyse SNP data. We announce a new r package, dartr, enabling the analysis of single nucleotide polymorphism data for population genomic and phylogenomic applications. dartr provides user-friendly functions for data quality control and marker selection, and permits rigorous evaluations of conformation to Hardy–Weinberg equilibrium, gametic-phase disequilibrium and neutrality. The package reports standard descriptive statistics, permits exploration of patterns in the data through principal components analysis and conducts standard F-statistics, as well as basic phylogenetic analyses, population assignment, isolation by distance and exports data to a variety of commonly used downstream applications (e.g., newhybrids, faststructure and phylogeny applications) outside of the r environment. The package serves two main purposes: first, a user-friendly approach to lower the hurdle to analyse such data—therefore, the package comes with a detailed tutorial targeted to the r beginner to allow data analysis without requiring deep knowledge of r. Second, we use a single, well-established format—genlight from the adegenet package—as input for all our functions to avoid data reformatting. By strictly using the genlight format, we hope to facilitate this format as the de facto standard of future software developments and hence reduce the format jungle of genetic data sets. The dartr package is available via the r CRAN network and GitHub.",
    keywords = "next-generation sequencing, phylogenomics, population genomics, r package, RADSeq, SNPs",
    author = "Bernd GRUBER and Peter UNMACK and Berry, {Oliver F.} and Arthur GEORGES",
    year = "2018",
    month = "5",
    day = "1",
    doi = "10.1111/1755-0998.12745",
    language = "English",
    volume = "18",
    pages = "691--699",
    journal = "Molecular Ecology Notes",
    issn = "1755-098X",
    publisher = "Wiley-Blackwell",
    number = "3",

    }

    TY - JOUR

    T1 - dartr

    T2 - Molecular Ecology Notes

    AU - GRUBER, Bernd

    AU - UNMACK, Peter

    AU - Berry, Oliver F.

    AU - GEORGES, Arthur

    PY - 2018/5/1

    Y1 - 2018/5/1

    N2 - Although vast technological advances have been made and genetic software packages are growing in number, it is not a trivial task to analyse SNP data. We announce a new r package, dartr, enabling the analysis of single nucleotide polymorphism data for population genomic and phylogenomic applications. dartr provides user-friendly functions for data quality control and marker selection, and permits rigorous evaluations of conformation to Hardy–Weinberg equilibrium, gametic-phase disequilibrium and neutrality. The package reports standard descriptive statistics, permits exploration of patterns in the data through principal components analysis and conducts standard F-statistics, as well as basic phylogenetic analyses, population assignment, isolation by distance and exports data to a variety of commonly used downstream applications (e.g., newhybrids, faststructure and phylogeny applications) outside of the r environment. The package serves two main purposes: first, a user-friendly approach to lower the hurdle to analyse such data—therefore, the package comes with a detailed tutorial targeted to the r beginner to allow data analysis without requiring deep knowledge of r. Second, we use a single, well-established format—genlight from the adegenet package—as input for all our functions to avoid data reformatting. By strictly using the genlight format, we hope to facilitate this format as the de facto standard of future software developments and hence reduce the format jungle of genetic data sets. The dartr package is available via the r CRAN network and GitHub.

    AB - Although vast technological advances have been made and genetic software packages are growing in number, it is not a trivial task to analyse SNP data. We announce a new r package, dartr, enabling the analysis of single nucleotide polymorphism data for population genomic and phylogenomic applications. dartr provides user-friendly functions for data quality control and marker selection, and permits rigorous evaluations of conformation to Hardy–Weinberg equilibrium, gametic-phase disequilibrium and neutrality. The package reports standard descriptive statistics, permits exploration of patterns in the data through principal components analysis and conducts standard F-statistics, as well as basic phylogenetic analyses, population assignment, isolation by distance and exports data to a variety of commonly used downstream applications (e.g., newhybrids, faststructure and phylogeny applications) outside of the r environment. The package serves two main purposes: first, a user-friendly approach to lower the hurdle to analyse such data—therefore, the package comes with a detailed tutorial targeted to the r beginner to allow data analysis without requiring deep knowledge of r. Second, we use a single, well-established format—genlight from the adegenet package—as input for all our functions to avoid data reformatting. By strictly using the genlight format, we hope to facilitate this format as the de facto standard of future software developments and hence reduce the format jungle of genetic data sets. The dartr package is available via the r CRAN network and GitHub.

    KW - next-generation sequencing

    KW - phylogenomics

    KW - population genomics

    KW - r package

    KW - RADSeq

    KW - SNPs

    UR - http://www.scopus.com/inward/record.url?scp=85046408975&partnerID=8YFLogxK

    U2 - 10.1111/1755-0998.12745

    DO - 10.1111/1755-0998.12745

    M3 - Article

    VL - 18

    SP - 691

    EP - 699

    JO - Molecular Ecology Notes

    JF - Molecular Ecology Notes

    SN - 1755-098X

    IS - 3

    ER -