Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of biological processes like development and pathogenesis. Unlike traditional bulk tools, scRNA-seq allows
researchers to characterize gene expression changes at the level of single cells, giving us
unprecedented insight into the distribution of mRNA levels across a population. Interestingly,
however, there has been very little work aimed at directly characterizing these distributions, since
most scRNA-seq analyses pipelines rely on techniques like PCA, which can obscure the changes
occurring in the distributions of individual genes. Here, we find that the majority of genes exhibit
highly right-skewed and “heavy tailed” distributions, regardless of the organism, dataset, or
technology used to measure mRNA levels in single cells. We could not explain the observed degree of
skewness using a variety of null models, including null models that account for the fact that some
cells have many more total mRNA counts than others. Taken together, this suggests that the observed
skewness is not a trivial consequence of the measurement technique, but instead reflects an
underlying biological reality. We next analyzed a dataset in which mice were exposed to a stroke-like
ii
brain injury, and found that the vast majority of genes that were differentially expressed between
injured and control mice involved changes in the tail of the distribution. This means that, if a gene is
upregulated upon injury, this is due to a small number of cells expressing the gene at a much higher
level, rather than an overall shift in the distribution. These findings have significant implications
both for the statistical analysis of scRNA-seq data as well as our fundamental understanding of the
epigenetic regulation of gene expression.