The process of molecular evolution has produced the diversity of proteins observed across life. The function and properties of proteins are chemically encoded by their amino acid sequence and can be described as a multi-dimensional energy landscape. A major effort in the field of biology has been to decode this information and to apply that knowledge to understand and manipulate protein function. Because the energy landscape fundamentally determines the behavior of a protein, there are likely evolutionary pressures on various features of the landscape. However, a detailed investigation into how the energy landscape of a protein changes over evolutionary time has been lacking.
In this work, we use ancestral sequence reconstruction (ASR) to access the evolutionary history of the ribonuclease H (RNase H) family. We reconstruct and study the properties of seven ancestral RNases H connecting the lineages of two homologs: a mesophilic RNase H from E. coli and a thermophilic RNase H from T. thermophilus. We characterize how the energetics, rates, and conformations of the RNase H energy landscape, and particularly, its folding pathway, evolved over time using a global analysis of ensemble relaxation kinetics, hydrogen exchange monitored by mass spectrometry (HX-MS), and fragment models of high-energy partially folded conformations. The folding trajectory of RNase H is remarkably robust to mutations over evolutionary time, with the major folding intermediate being energetically and structurally conserved over three billions years of evolution. There are notable trends in the folding and unfolding rates, and RNase H becomes more kinetically stable over time. We observe how the conserved folding intermediate enables distinct trends in the thermodynamic and kinetic properties of the folding landscape. Additionally, we used HX-MS to obtain near-site-resolved structural resolution into the conformations of ancestral RNase H during folding. The earliest events in the folding trajectory are malleable over evolutionary time, and we use the evolutionary trends to identify the mechanisms that drive the folding of this protein.
In addition to revealing the evolutionary history of the RNase H family, these ancestral proteins can be used to understand protein fitness in the cellular context, and be compared to consensus proteins to investigate how evolutionary sequence information can be used to engineer protein properties. Lastly, the evolutionary trends in the RNase H family can be extended towards other protein families to identify general principles that guide protein evolution.