Technological advancements have led to an exponential increase in omics data generation. This data presents a unique big-data-to-knowledge challenge and in turn opportunity for analysis and interpretation. The construct of the pan-genome and its subsets when paired with systems biology tools, such as genome-scale models of microbial metabolism, offer a variety of means to generate meaningful predictions from genome sequence alone. The pairing of these frameworks allows for a scalable, data-driven, comparative approach to study the evolutionary trajectories of bacterial species. This dissertation focuses on the development and deployment of pan-genome analytics tools towards the study of numerous high-threat level microbial pathogens. Chapter 1 introduces key systems biology techniques and concepts used throughout this thesis in particular with regard to the importance of scale of datasets.
Chapter 2 focuses on the generation of a new updated reconstruction for Acinetobacter baumannii and analysis of gene conservation and catabolic capabilities across the species.
Chapter 3 details comparative genome scale metabolic modeling on multidrug-resistant strains of Klebsiella pneumoniae and evaluates the ability of metabolic capabilities to inform on resistance profiles.
Chapter 4 describes the generation of a new reconstruction of Clostridioides difficile and use of this resource to evaluate the microenvironmental pressures of laboratory isolates as well as a detailed evaluation of the core-genome of the species.
Chapter 5 delineates the multi-strain reconstruction protocol used in many of the other chapters and numerous other studies providing this workflow as a resource to the research community.
Chapter 6 conducts an in-depth update to the BiGG Models knowledge base both improving the scope and diversity of content and integrating new functionalities commensurate with the directions of the field.
Chapter 7 engages in comparative analysis of C. difficile strains and details the development of a novel strain typing method that groups strains based on accessory genomes. These strain typings are compared in detail to the leading strain-typing schemes within the epidemiology of C. difficile infection and used to identify defining genetic features.
Chapter 8 provides a reflection on the state of pan-genomic applications and future directions for systems biology.