ABSTRACT OF THE DISSERTATION
Integrating molecular phenotypes and gene expression to characterize DNA variants for cardiometabolic traits
by
Alejandra Rodriguez
Doctor of Philosophy in Human Genetics
University of California, Los Angeles, 2018
Professor P�ivi Pajukanta, Chair In-depth understanding of cardiovascular disease etiology requires characterization of its genetic, environmental, and molecular architecture. Genetic architecture can be defined as the characteristics of genetic variation responsible for broad-sense phenotypic heritability. Massively parallel sequencing has generated thousands of genomic datasets in diverse human tissues. Integration of such datasets using data mining methods has been used to extract biological meaning and has significantly advanced our understanding of the genome-wide nucleotide sequence, its regulatory elements, and overall chromatin architecture. This dissertation presents integration of “omics” data sets to understand the genetic architecture and molecular mechanisms of cardiovascular lipid disorders (further reviewed in Chapter 1).
In 2013, Daphna Weissglas-Volkov and coworkers1 published an association between the chromosome 18q11.2 genomic region and hypertriglyceridemia in a genome-wide association
iii
study (GWAS) of Mexican hypertriglyceridemia cases and controls. In chapter 2, we present the fine-mapping and functional characterization of the molecular mechanisms underlying this triglyceride (TG) association signal on chromosome 18q11.22. Specifically, we found nine additional variants in linkage disequilibrium (LD) with the lead single nucleotide polymorphism (SNP). Using luciferase transcriptional reporter assays, electrophoretic mobility shift assays, and HNF4 ChIP-qPCR (chromatin immunoprecipitation coupled with quantitative polymerase chain reaction), we found that the minor G allele of rs17259126 disrupts an HNF4A binding site. Furthermore, using cis expression quantitative trait locus (eQTL) analysis, we found that the G allele of rs17259126 is associated with decreased expression of the regional transmembrane protein 241 (TMEM241) gene2. Our results suggest that reduced transcript levels of TMEM241 likely contribute to the increased serum TG levels in Mexicans.
GWAS variants typically have small effect sizes, and about 40% of them are located in intergenic regions and 40% in intronic regions. Since a large number of GWAS variants reside in non-coding regions, these SNPs are thought to affect gene regulation via disruption of functional elements, such as transcription factor binding sites (TFBS). Mapping genome-wide TFBS using chromatin immunoprecipitation followed by sequencing (ChIP-Seq) can identify such binding sites for specific transcription factors (TFs). It can also help identify unknown TF targets, complex interaction networks, and hub genes that can ultimately lead to the discovery of pharmaceutical targets. In Chapter 3, we present our results of the investigation of genome-wide targets of the RAR Related Orphan Receptor A (RORA), a high-density lipoprotein cholesterol (HDL-C) GWAS gene in Mexicans3 and a known regulator of the apolipoproteins, APOA5, APOA1, and APOC3.
iv
Despite the several hundred lipid loci identified by GWAS, it has become increasingly clear that variation at these known loci explains only a small fraction of the trait heritability. In addition to rare variants, contributions to variation in lipid traits that can be attributed to complex genetic models, such as gene-environment and epistatic interactions, have been hypothesized to be additional sources of this “missing heritability.” In chapter 4, we present our findings of the investigation of genes that exhibit context-dependent expression variance and their underlying variance expression quantitative trait loci (ve-QTLs). Our cohort consisted ofMexicans exhibiting extreme TG values with subcutaneous adipose tissue expression microarrays available for study. We found that individuals with low serum TGs displayed a greater ATP citrate lyase (ACLY) expression variance than the individuals with high TGs. We replicated this observation in the Finnish METabolic Syndrome In Men4 (METSIM) adipose RNA-Sequence cohort (p-value=1.8x10-3). ACLY encodes the primary enzyme responsible for the synthesis of cytosolic acetyl-CoA in many tissues, which is vital for the biosynthesis of fatty acids, a precursor of TGs. One hypothesis is that reduced ACLY expression variance under increased TG context leads to an increased degree of constraint in lipid biosynthesis pathways, followed by decreased robustness in its response to environmental stimuli and buffering ability against cryptic genetic variation. We used a correlation least squared (CLS) test and found that the reference allele of variant rs34272903 (T/C) is associated with an increased ACLY expression variance (FWER p-value=1.0x10-4). Our results suggest that the reference T allele of rs34272903 interacts with an unknown factor under the low TG context, increasing ACLY expression variance. This interaction may contribute to efficient responses in the lipid pathway activation to endo-exogenous stimuli via unknown mechanisms.