High throughput RNA sequencing (RNA-seq), and more recently, long read (LR) RNA-seq have revolutionized the study of gene expression. We’re able to sequence massive libraries of transcriptomic data upon which we can apply manifold analytical approaches to extract meaningful and actionable biological findings. Short read RNA-seq remains the prevailing method for transcriptomic characterization, owed to its capacity to accurately quantify gene expression at the RNA-level and to capture a wealth of information for the study of alternative RNA processing. However, an intrinsic shortcoming of short read RNA-seq is its reliance on small fragments of messenger RNAs (mRNA) to infer complete transcript structures and to resolve isoform-level expression. Additionally, when used on its own, it lacks the multidimensionality necessary to comprehensively distinguish the modes of regulation (transcriptional vs. post-transcriptional) that underlie changes in RNA abundance between conditions or to accurately infer the translational output of mRNAs. Here, I present my work to integrate small RNA (sRNA) and mRNA sequencing approaches to explore SARS-CoV-2 (SC2) infection-mediated perturbations to the host mRNA and sRNA landscape (Chapter 2). I show that dozens of human microRNAs (miR) and novel SC2-derived small viral RNAs (svRNA) are dynamically expressed during SC2 infection, and I propose the intriguing hypothesis that several of the svRNAs may function like miRs to confer pleiotropic regulatory impacts to the host transcriptome. I further present my work on a bioinformatic tool called junctionCounts, which seeks to comprehensively characterize alternative splicing (AS) events in RNA-seq data (Chapter 3). In concert with its partner utilities cdsInsertion and findSwitchEvents, junctionCounts stands apart from other AS analysis tools both by profiling non-canonical event types and by predicting functional outcomes of AS events including nonsense-mediated decay (NMD) and coding-to-noncoding switches induced by the inclusion or exclusion of alternative exons, introns or splice sites.
Finally, in Chapter 4, I present my work on the development of a translatomic method called long read subcellular fractionation and sequencing (LR Frac-seq). I propose a framework for integrating both LR and short read Frac-seq data to faithfully capture the complete structures of ribosome-associated transcripts from long reads, and to accurately quantify them utilizing the superior throughput of short reads. I show that isoform-specific ribosome association is pervasive and cell type-specific in embryonic stem cells and neuronal progenitor cells, and I propose this approach as a novel way to study AS coupled with translational control (ASTC).