While large-scale sequencing efforts have focused on the mutational landscape of the coding genome, the vast majority of cancer-associated variants lie within non-coding regions. In the context of tumor progression, these regions may harbor key regulatory drivers, yet an integrated method to discover and interrogate functional regions remains unexplored. In chapter 1, we present an integrative computational and experimental framework to identify recurrently mutated non-coding regulatory regions that drive tumor progression. Applying this framework to sequencing data from a large prostate cancer patient cohort revealed a large set of candidate drivers. We use (i) in silico analyses, (ii) massively parallel reporter assays, and (iii) in vivo CRISPR interference screens to systematically validate mCRPC drivers. One found enhancer region, GH22I030351, acts on a bidirectional promoter to simultaneously modulate expression of U2-associated splicing factor SF3A1 and chromosomal protein CCDC157. SF3A1 and CCDC157 promote tumor growth in vivo. We nominate a number of transcription factors, notably SOX6, to regulate expression of SF3A1 and CCDC157. Our integrative approach enables the systematic detection of non-coding regulatory regions that drive human cancers. Outside of cis-acting genomic regulatory elements that can play a driving role in driving cancer, the broad reprogramming of the cancer genome leads to the emergence of molecules that are specific to the cancer state. We previously described orphan non-coding RNAs (oncRNAs) as a class of cancer-specific small RNAs with the potential to play functional roles in breast cancer.
progression. Expanding upon this idea, in chapter 2, we report a systematic and comprehensive search to identify, annotate, and characterize cancer-emergent oncRNAs across 32 tumor types. We leverage large-scale in vivo genetic screens in xenografted mice to functionally identify driver oncRNAs in multiple tumor types. We not only discover a large repertoire of oncRNAs, but also find that their presence and absence represent a digital molecular barcode that faithfully captures the types and subtypes of cancer. Importantly, we discover that this molecular barcode is partially accessible from the cell-free space as some oncRNAs are secreted by cancer cells. In a large retrospective study across 192 breast cancer patients, we show that oncRNAs can be reliably detected in the blood and that changes in the cell-free oncRNA burden captures both short-term and long-term clinical outcomes upon completion of a neoadjuvant chemotherapy regimen. Together, our findings establish oncRNAs as an emergent class of cancer-specific non-coding RNAs with potential roles in tumor progression and clinical utility in liquid biopsies, providing the first tumor-naive minimum residual disease monitoring approach for breast cancer.Lastly, we explore the utilization of intrinsic transcriptional noise encoded within the cell as a mechanism of tumor proliferation and resistance in the face of unfamiliar microenvironments. More specifically, intratumoral heterogeneity (ITH) is recognized as a driver of therapeutic resistance and fatal cancer recurrence. ITH occurs at both a genetic and transcriptional level and enables tumor cells to adapt to variable environmental pressures, such as hypoxia, immune surveillance, and targeted molecular therapy. In chapter 3, through integrating in silico analysis of BRCA TCGA-RNA-Seq data, in vivo CRISPRi screens, and in vitro single-cell transcriptomics, we identify RNF8 and MIS18A as drivers of transcriptional heterogeneity. Modulating expression of these two genes impacts cellular fitness, chemotherapeutic sensitivity, and metastatic potential in a proportional manner, underscoring their roles in driving cancer progression. Analysis of human breast cancer patient data reveals that increased expression of
these genes correlates with detrimental survival outcomes. This study expands our understanding of transcriptional regulators of ITH and their potential as therapeutic targets.In summary, this thesis explores broadly how regulatory elements—encompassing enhancers, non-coding RNAs, and chromatin organizers—can drive cancer progression, shape tumor heterogeneity, and offer new avenues for clinical biomarker development and therapeutic intervention.