Synthetic biology has the potential to transform numerous aspects of our human existence, such as the manner in which we combat disease, cultivate food, and produce goods. However, the widespread use of engineered organisms in industrial and medical settings is hindered in part by i) the development of biological tools and parts in model organisms that are not guaranteed to function in application-relevant organisms, and ii) the lack of computational models to assess the robustness of genetic circuit designs and predict transcriptional changes due to the introduction of foreign DNA. Addressing these two challenges is made difficult due to the high-dimensional nature of gene regulatory networks. In this thesis, we address these challenges through a combination of large volumes of data and mathematically-principled approaches.
A major challenge in biotechnology and biomanufacturing is the identification of a set of biomarkers for perturbations and metabolites of interest. In Chapter 2, we develop a data-driven, transcriptome-wide approach to rank perturbation-inducible genes from time-series RNA sequencing data for the discovery of analyte-responsive promoters. This provides a set of biomarkers that act as a proxy for the transcriptional state referred to as cell state. We construct low-dimensional models of gene expression dynamics and rank genes by their ability to capture the perturbation-specific cell state using a novel observability analysis. Providing an optimal selection of reporters, observability analysis identified 15 candidate biosensors for a pesticide in a non-model organism, whose collective response is greater than the sum of its parts. The engineered host cell, a living malathion sensor, can be optimized for use in environmental diagnostics while the developed machine learning tool can be applied to discover perturbation-inducible gene expression systems in the compendium of host organisms.
While observability analysis provides optimal selection of reporters to construct biosensors from, the engineering of microbes to transcribe and translate foreign DNA that comprise the biosensors (or any other synthetic genetic circuit) results in an unintended influence on the host transcriptome the consequences of which can be fatal to the engineered cell. In Chapter 3, we develop structured dynamic mode decomposition (sDMD), comprising of compositional Koopman operators which model the induced transcriptional changes of the host transcriptome by exogenous genes. We consider an experimental example, using high-throughput RNA sequencing measurements collected from wild-type \textit{E. coli}, single gate components transformed in \textit{E. coli}, and a NAND circuit composed from individual gates in \textit{E. coli}, to explore how compositional Koopman models encode increasing circuit interference on the native \textit{E. coli} transcriptome. From this dataset, sDMD can both recover known regulatory biology through recapitulation of known sugar utilization hierarchies and predict new regulatory mechanisms induced by the transcription and translation of foreign DNA.
While useful for modeling circuit-host impact, dynamic mode decomposition has significant drawbacks, the most important being the reliance on uniformly sampled data in time, a feature rarely satisfied by high-throughput biological measurements. This results in subpar usage of transcriptional data for inference of circuit-host impact. In Chapter 4, we develop a deep learning approach to disentangle the factors causing the transcriptional changes on a gene-by-gene basis. The model architecture is inspired by a latent process model in which we treat the latent gene expression and latent perturbations as unobserved processes. We show that we are able to recover known regulatory biology as well as discover new regulatory mechanisms which lead to unintended consequences. Specifically, our circuit-host impact model infers that engineering microbes are more resistant to $\beta$-lactam antibiotics than their wild-type counterparts. This hypothesis is confirmed in gram-negative bacteria.