Complex diseases, and cardiovascular diseases such as coronary artery disease, atrial fibrillation, and heart failure, are leading causes of morbidity and mortality worldwide. These diseases arise from interactions between lifestyle factors, environmental influences, and multiple disease associated genes. Efforts to identify the driving genes underlying complex diseases have culminated in genome-wide association studies (GWAS), which measure associations between common human sequence variants and disease phenotypes in large population cohorts. To date, GWAS have identified tens of thousands of sequence variants associated with cardiovascular diseases and a spectrum of other complex diseases. However, the vast majority of these variants reside in the noncoding regions of the genome, and do not directly disrupt protein-coding sequences in genes.Cis-regulatory elements (CREs) are noncoding sequences that regulate the expression levels of neighboring genes in a cell type-specific fashion. Observations that disease associated variants from GWAS are enriched in CREs led to the hypothesis that a major mechanism by which these variants influence disease is by disrupting the regulation of gene expression in specific cell types. However, we still lack comprehensive maps of CREs, not only in the cell types of the human heart, but also in the majority of tissues in the human body. The absence of such maps has posed a key challenge to discovery of the cell types through which disease-associated variants act and the interpretation of their detailed molecular mechanisms.
These challenges, reviewed in Chapter 1, led me to ask the following questions which form the backbone of my thesis research: 1) how do individual human cell types utilize CREs to regulate gene expression, and 2) how do disease-associated noncoding sequence variants from GWAS influence cell type-specific gene regulation to cause disease? In this dissertation, I set out to address these questions in projects of progressively more expansive scope.
First, in Chapter 2, I used single cell epigenomic and transcriptomic methods to define the regulation of gene expression by candidate CREs (cCREs) in nine cell types from the adult human heart. By localizing risk variants for cardiovascular diseases to these cCREs, I uncovered strong enrichments of variants associated with complex cardiovascular diseases in cCREs from individual cardiac cell types, such as atrial fibrillation (AF) variants in cardiomyocyte cCREs. Next, I examined the specific AF risk variants underlying these enrichments, linked them to putative target genes, and tested their molecular mechanisms in human iPSC derived cardiomyocytes using luciferase reporter assays and CRISPR-Cas9 mediated genome editing. Results from these experiments showed that a cardiomyocyte-specific enhancer containing noncoding AF risk variants is necessary for KCNH2 expression and regulation of action potential repolarization in cardiomyocytes.
Using this work as a foundation, in Chapter 3, I next applied single cell epigenomic methods to 30 different tissue types from across the entire adult human body. Integrating these datasets with corresponding data from 15 fetal tissue types revealed the cell type-specificity of over 1 million cCREs in 222 distinct human cell types. Moving beyond cardiovascular diseases and cardiac cell types, I next localized risk variants from the spectrum of complex human diseases and traits to body wide maps of cCREs in human cell types. This analysis resulted in thousands of significant enrichments of risk variants for complex diseases in cCREs of specific cell types. To link specific variants to putative molecular functions, I created a framework that incorporates statistical fine mapping, target gene linkage, and measurements of transcription factor binding site disruption to yield candidate molecular functions for hundreds of distinct noncoding risk variants. I lastly highlight examples of specific variants that may disrupt the activity of cell type-specific cCREs to contribute to complex diseases.
Finally, in Chapter 4 I summarize future directions of this research. First, I outline technological developments that will greatly enhance the utility of these data and frameworks for interpreting the functions of complex disease risk variants. Second, I describe ongoing work to use the healthy tissue datasets I generated as a springboard for uncovering cell type-specific gene regulatory programs in diseased human tissues, with a focus on ischemic heart failure.