Reduced representation genotyping for bacterial identification, discovery and genomic analysis

  • Berenice Talamantes Becerra

Student thesis: Doctoral Thesis

Abstract

Bacterial identification methods are important for medical, environmental, food and industrial microbiology. Current bacterial identification methods range from low resolution techniques such as biochemical testing and sequencing of the 16S rRNA gene to high-resolution methods such as whole genome sequencing. There are few options in between. To fill this gap, I applied a reduced-representation sequencing technique (DArTseq) for bacterial identification and typing to the field of microbiology, specifically medical microbiology and environmental microbiology. To analyse reduced-representation sequencing data, I developed a bioinformatics pipeline, Currito3.1 DNA Fragment Analysis Software for bacterial identification and strain typing. To meet these targets on medical and environmental microbiology, this thesis presents results from two case studies. The first case study involved genotyping 165 bacterial isolates previously identified using conventional methods, provided by the Microbiology Department of Canberra Public Hospital. These were processed with reduced-representation sequencing, using three combinations of restriction enzymes: PstI with MseI, PstI with HpaII and MseI with HpaII. All bacterial samples were correctly identified to genus and species by each of the three combinations of restriction enzymes. In the second case study, bacterial isolates were obtained from compost, domestic hot water systems and artesian bores of the Great Artesian Basin. The sampling locations represented extreme environments with temperatures as high as to 98°C. The study resulted in the isolation of 99 bacterial strains of the thermophilic genera Anoxybacillus, Geobacillus and Parageobacillus, from which 8 samples were selected for whole-genome sequencing. Identifications using reduced-representation sequencing agreed completely with identifications provided by whole-genome sequencing. Novel species were discovered within this set of bacterial isolates. A phylogenetic analysis and comparative genomic study of the three thermophilic bacterial genera, Anoxybacillus, Geobacillus and Parageobacillus, was performed to confirm the taxonomic placement of seven new genomes of thermophilic bacteria. Substantial changes to the delimitation of the three genera have been made in recent years, and an integrated phylogenomic analysis was considered necessary to explore the phylogenetic relationships between these closely related genera, and provide correct placements for the newly sequenced genomes. A total of 113 complete genome assemblies from the RefSeq database, including Anoxybacillus, Geobacillus and Parageobacillus, were selected. Phylogenomic metrics were obtained, including calculation of Average Nucleotide Identity (ANI) and Average Amino acid Identity (AAI) and a maximum likelihood tree was constructed from alignment of a set of 662 orthologous core genes. The combined results from the core gene trees and ANI and AAI UPGMA dendrograms show that the genomes split into two main clades. Clade I contains all Geobacillus, all Parageobacillus and some species of Anoxybacillus, and Clade II, contains the majority of Anoxybacillus species. Clade I is further partitioned into three clades, consisting separately of Geobacillus, Parageobacillus, and a third clade which we suggest should be elevated to a new genus (Quasigeobacillus gen. nov.). In conclusion, complexity-reduced genotyping offers an accurate alternative to conventional methods for bacterial identification and strain typing and generates sequencing results without the need for previous sequence information for primer design. This allows for high-resolution sequence data to be produced for any bacteria without prior knowledge of taxonomic affinity. This technology fills a gap in currently available technologies, until such time as whole-genome sequencing is economically viable for routine application, and bioinformatic tools for such a purpose, are readily available for use.
Date of Award2019
Original languageEnglish
SupervisorAshraf GHANEM (Supervisor), Arthur GEORGES (Supervisor), Dennis Mcnevin (Supervisor) & Michelle Gahan (Supervisor)

Cite this

'