Natural protein sequences are the result of optimization on a mutational landscape with multiple competing pressures. These pressures will arise from constraints imposed by selection for a particular fold and function within a cellular context. We can categorize these pressures as structural-functional and environmental. Nevertheless, it remains a challenge to quantify mutational landscapes with thousands of mutations and dissect the contributions from multiple constraints. Similarly, we generally do not know how to encode protein design models with the structural constraints that define a complex molecular function for even in vitro environments. Reverse-engineering the multiple structural-functional and environmental pressures that were integrated to yield the mutational landscapes that produced natural proteins would improve our understanding of the cellular milieu and our ability to engineer new protein functions.
Using E. coli dihydrofolate reductase (DFHR) as a model system, we developed computation and experimental methods for identifying, quantifying, and modeling structural-functional and environmental constraints on functional proteins. Chapter 1 of this thesis describes a multi-state modeling framework for encoding complex functions into protein design and the application of this framework to recovering evolutionary sequence preferences in DHFR and other model systems. Chapter 2 describes the calibration of a high-throughput selection assay for DHFR activity and the mutational landscape for a library of all possible single point mutants to DHFR. Chapter 3 describes the quantification of broad impacts to the DHFR mutation landscape from expression of Lon protease. The results in these three chapters show the impact of structural-functional and environmental constraints on sequence preferences from mutational landscapes. These data allow us to propose methods for engineering the behavior of entire mutational landscapes by modulating environmental constraints.