The brain is a complex organ that controls thought, memory, emotion, touch, motor skills, vision, breathing, temperature, hunger, and many processes that regulate our body. Alzheimer’s disease (AD) is a neurodegenerative disease that is characterized by memory loss and impaired cognitive function. It is associated with the accumulation of plaques and tangles in the brain. The cortex and hippocampus are critical brain regions for learning because of their tasks of neural integration and memory respectively. Therefore, these regions have been characterized exhaustively under different conditions and models to understand the cell subtypes involved. Changes in gene expression and isoforms during development, aging, and disease are controlled by multiple, overlapping programs. The gene expression profiles of distinct cell types arise reflect from complex genomic interactions among multiple simultaneous biological processes within each cell that can be altered by disease progression. Gene functionality is closely connected to its expression specificity across tissue and cell types. These functions can be inferred by the abundance and activity of co-expression networks using bulk RNA-seq. Short-read single-cell RNA-seq is a widely-used method to characterize cellular heterogeneity in complex tissues based on gene expression. A critical step in the analysis of large genome-wide gene expression datasets is the use of module detection methods to identify which genes vary in an informative manner and determine how these genes organize into modules. Because of the limitations of classical clustering methods/detecting modules, numerous alternative module detection methods have been proposed, which improve upon clustering by handling co-expression in only a subset of samples, modeling the regulatory network, and/or allowing overlap between modules.
Here, I describe my work on characterizing the transcriptome of mouse cortex and hippocampus using bulk RNA-seq in conjunction with single-cell/nucleus RNA-seq to characterize changes during normal development and aging by comparing several mouse models of AD against control mice to study genes associated with neurodegeneration. First, I describe the PyWGCNA package to analyze gene expression and to infer meaningful modules of co-expressed genes that respond to different conditions such as age in different mouse models of AD using bulk RNA-seq. Then, I describe my novel reproducible grade of membership model called Topyfic, which is designed to derive topic models that correspond to cellular programs. I then apply Topyfic to distinct brain RNA-seq datasets from MODEL-AD and ENCODE and detect major changes in microglia, astrocytes, and oligodendrocytes that vary based on genotype and sex. Finally, I investigate possible ways to deconvolve modules into topics and make a connection between them. Together, these new computational methods provide novel insights into cellular programs in health and disease.