- Vallania, Francesco;
- Zisman, Liron;
- Macaubas, Claudia;
- Hung, Shu-Chen;
- Rajasekaran, Narendiran;
- Mason, Sonia;
- Graf, Jonathan;
- Nakamura, Mary;
- Mellins, Elizabeth D;
- Khatri, Purvesh
Monocytes are crucial regulators of inflammation, and are characterized by three distinct subsets in humans, of which classical and non-classical are the most abundant. Different subsets carry out different functions and have been previously associated with multiple inflammatory conditions. Dissecting the contribution of different monocyte subsets to disease is currently limited by samples and cohorts, often resulting in underpowered studies and poor reproducibility. Publicly available transcriptome profiles provide an alternative source of data characterized by high statistical power and real-world heterogeneity. However, most transcriptome datasets profile bulk blood or tissue samples, requiring the use of in silico approaches to quantify changes in cell levels. Here, we integrated 853 publicly available microarray expression profiles of sorted human monocyte subsets from 45 independent studies to identify robust and parsimonious gene expression signatures, consisting of 10 genes specific to each subset. These signatures maintain their accuracy regardless of disease state in an independent cohort profiled by RNA-sequencing and are specific to their respective subset when compared to other immune cells from both myeloid and lymphoid lineages profiled across 6160 transcriptome profiles. Consequently, we show that these signatures can be used to quantify changes in monocyte subsets levels in expression profiles from patients in clinical trials. Finally, we show that proteins encoded by our signature genes can be used in cytometry-based assays to specifically sort monocyte subsets. Our results demonstrate the robustness, versatility, and utility of our computational approach and provide a framework for the discovery of new cellular markers.