Despite over three decades of active research, Human Immunodeficiency Virus Type I (HIV-1) represents a pressing and ongoing challenge to our ability to understand complex pathogens that threaten human health at global scale. This stems in part from the highly dynamic and multi-scale lifecycle of the virus, which is reflected in the necessity for multi-drug cocktails to suppress viral replication. A fundamental aspect of the HIV lifecycle is the insertion or integration across a plethora of sites in the human genome. From these sites of integration, the viral promoter coordinates multiple inputs from the cellular signaling environment, the local genetic and epigenetic context, and a virally encoded feedback circuit. Through integration of these inputs via a poorly understood transfer function, the viral promoter regulates the temporal probability of the expression of viral genes and ultimately the kinetics of the viral lifecycle. The HIV promoter, the long terminal repeat (LTR), is composed of a core TATA promoter and a large cis acting enhancer. Furthermore, two well-positioned nucleosomes form the basic epigenetic regulatory structures that regulate LTR output. These features are highly similar to endogenous mammalian promoters. Therefore, studies of the LTR can reveal insights relevant to HIV biology as well as mammalian gene control.
HIV represents a highly attractive model system to apply large-scale experimental and computational analysis to systematically study the role of promoter architecture and genomic context in transcriptional regulation. Previous work demonstrated that the LTR operates in a stochastic regime, which results in large cell-to-cell differences in gene expression within clonal populations. In HIV model systems and natural circuits, it has been shown that such heterogeneity or noise can have dramatic phenotypic consequences. However, how such noise is related to the features of promoters, the local genomic context, and their interaction with genetic circuits and cell biology, is poorly understood.
Here using combined large-scale experimental and computational analysis we develop systems to isolate and study the effects of viral integration positions, promoter cis acting architecture, transcriptional feedback and post-transcriptional processes on viral gene expression noise. Specifically, by using systematic experimental analysis of single integration clones coupled with computational modeling and statistical inference, we are able to link LTR gene expression noise to dynamic nucleosome occupancy. Furthermore, we develop a method to generate highly diverse combinatoric promoters that will enable systematic study of the LTR structure-function relationship. Additionally, we develop quantitative strategies to cluster highly heterogeneous feedback distributions and reveal characteristic moment scaling relationships. Lastly, using simulations and analysis of mRNA localization using RNA fluorescent in situ hybridization, we demonstrate that slow mRNA export can function as a low-pass filter to buffer the cytoplasm from the effects of noisy transcription. Together, these efforts reveal molecular and cellular processes underlying noisy gene expression. Furthermore, we develop novel systems level approaches and tools for the study of complexity in eukaryotic gene control.