Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Biology at scale: towards angstrom-level resolution across tens of thousands of bacteria strains

No data is associated with this publication.
Abstract

With the growing number of publicly available bacterial genome sequences, the ability to resolve the shapes of increasingly complex proteins through advancements in microscopy and protein-folding prediction software, and the mechanistic insight provided by genome-scale models (GEMs), microbial biology is rapidly entering the digital age. Tens of thousands of whole genome sequences can now be simultaneously analyzed, and a full accounting of the gene content and genetic variation within a bacteria can be assessed. A sequence variant (mutation) can be mapped onto the 3D protein structure, where its proximity to enzymatic domains or structural features may offer a physio-chemical basis for its observed effect. Structures that reflect the multi-subunit nature of a protein can be incorporated into genome-scale models to obtain an angstrom-level understanding of whole-cell functions. Undoubtedly, interoperable workflows that offer angstrom-level resolution across a scale that spans thousands of genomes will usher in a new generation of analytical tools, with implications for evolutionary, structural, and systems biology. In this dissertation, I build such workflows and present findings that can only be revealed at this new scale of biological data. First, I quantify the sequence variation in Escherichia coli by defining its “alleleome” – the collection of all alleles of all genes found in the whole genome sequences of 2,661 wild-type strains – and find extensive differences between wild-type and laboratory-evolved strains. Second, I generate the Quaternary Structural Proteome Atlas of a Cell (QSPACE) – an oligomeric structural representation of the cellular proteome – for E. coli and use interoperable residue-level data (e.g., mutations, functional domains, subcellular compartments) to analyze sequence variants and to generate a draft image of an optimal cell. Third, I generate alleleomes for 184 bacterial species (from 54,191 strains) and reveal characteristics of the evolutionary history of modern-day bacteria. Taken together, this dissertation describes foundational interoperable workflows that vastly expand the scale and resolution at which microbes can now be studied.

Main Content

This item is under embargo until October 2, 2025.