Global RNA sequencing technologies have revealed widespread RNA polymerase II (Pol II) transcription outside of gene promoters. Small 5'-capped RNA sequencing (Start-seq) originally developed for the detection of promoter-proximal Pol II pausing has helped improve annotation of Transcription Start Sites (TSSs) of genes as well as identification of non-genic regulatory elements. However, apart from the most well studied genomes of human and mouse, mammalian transcription has not been profiled with sufficiently high precision.
We prepared and sequenced Start-seq libraries from rat (Rattus norgevicus) primary neural progenitor cells. Over 48 million uniquely mappable reads from two independent biological replicates allowed us to define the TSSs of 7365 known genes in the rn6 genome, reannotating 2503 TSSs by more than 5 base pairs, characterize promoter-associated antisense transcription, and profile Pol II pausing. By combining TSS data with polyA-selected RNA sequencing, we also identified thousands of potential new genes producing stable RNA as well as non-genic transcripts representing possible regulatory elements.
Our study has produced the first Start-seq dataset for the rat. Apart from profiling transcription initiation, our data reaffirm the prevalence of Pol II pausing across the rat genome and indicate conservation of pausing mechanisms across metazoan genomes. We suggest that pausing location, at least in mammals, is constrained by a distance from initiation of transcription, whether it occurs at or outside of a gene promoter. Abundant antisense transcription initiation around protein coding genes indicates that Pol II recruited to the vicinity of a promoter is distributed to available start sites of transcription at either DNA strand. Transcriptome profiling of neural progenitors presented here will facilitate further studies of other rat cell types as well as other organisms.