The Biology is in The Tails: Skewness and Upregulation in scRNA-seq
Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

The Biology is in The Tails: Skewness and Upregulation in scRNA-seq

No data is associated with this publication.
Abstract

Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of biological processes like development and pathogenesis. Unlike traditional bulk tools, scRNA-seq allows researchers to characterize gene expression changes at the level of single cells, giving us unprecedented insight into the distribution of mRNA levels across a population. Interestingly, however, there has been very little work aimed at directly characterizing these distributions, since most scRNA-seq analyses pipelines rely on techniques like PCA, which can obscure the changes occurring in the distributions of individual genes. Here, we find that the majority of genes exhibit highly right-skewed and “heavy tailed” distributions, regardless of the organism, dataset, or technology used to measure mRNA levels in single cells. We could not explain the observed degree of skewness using a variety of null models, including null models that account for the fact that some cells have many more total mRNA counts than others. Taken together, this suggests that the observed skewness is not a trivial consequence of the measurement technique, but instead reflects an underlying biological reality. We next analyzed a dataset in which mice were exposed to a stroke-like ii brain injury, and found that the vast majority of genes that were differentially expressed between injured and control mice involved changes in the tail of the distribution. This means that, if a gene is upregulated upon injury, this is due to a small number of cells expressing the gene at a much higher level, rather than an overall shift in the distribution. These findings have significant implications both for the statistical analysis of scRNA-seq data as well as our fundamental understanding of the epigenetic regulation of gene expression.

Main Content

This item is under embargo until June 16, 2025.