Organismal development relies on precise temporal and spatial regulation of numerous processes to ensure proper formation of the body plan including the enforcement of symmetric and asymmetric systems. Gene regulatory networks (GRNs) are organized in such a way as to ensure fidelity of each process, or to ensure a precise and accurate phenotypic outcome despite the noise inherent in biological systems. This leads to nontrivial questions, including what genetic mechanisms, if any, are in place to ensure fidelity and how is a GRN as a whole affected by genetic variation in its individual components. Further, how does such variation lead to phenotypic variation, especially with complex or multigenic traits that rely on multiple inputs. Here I describe the application of quantitative genetic methods to identify loci associated with fidelity of two separate but important developmental processes in the nematode C. elegans: 1) Left-right (L/R) asymmetry of the gut-gonad orientation, and 2) proper implementation of programmed cell death (PCD) events that typically sculpt bilaterally symmetric sensory rays of the male tail, which are necessary for copulation. We found significant phenotypic variation for both phenotypes in males from C. elegans isotypes and closely related gonochoristic and hermaphroditic species, suggesting that variation in these traits is not solely attributable to lab domestication or differences in selective pressure (s) on males (Chapters 3 and 4). For association mapping, we use two sources of genetic and phenotypic variation – a globally diverse collection of C. elegans isotypes and recombinant inbred lines (RILs) generated from isotypes with either consistently high or low phenotypic variance for additional traits. We apply genome wide association studies (using linear mixed model design for highly inbred model organisms) and quantitative trait locus (QTL) mapping to identify causal genetic regions. We report that the missing rays or ray defects originally noted by Sulston were nearly suppressed in canonical programmed cell death mutants in the reference strain in N2 (Chapter 2); this was also the case in some C. elegans isotypes with a high propensity for defects when the endogenous pro-apoptotic factor EGL-1, a BH3-only domain protein, was knocked down using RNAi (Chapter 3), suggesting there is standing genetic variation in mis-regulation of the PCD pathway. GWAS and linkage disequilibrium analysis of significant SNPs suggest a ~3 Mb region on chromosome II is associated with stochastic cell death in 87 unique C. elegans isotypes. QTL mapping with RILs made from isolates QX1211 and AB4 further identified a much larger region on chromosome II, in addition to QTL on chromosomes III and X, that explains ~ 55% of natural variation in this trait (Chapter 3). In the case of the observed heterotaxy – the improper arrangement of at least one but not all visceral organs, GWAS identified two QTL on chromosomes II and III that affect only males, suggesting that different QTLs and other genetic/cellular mechanisms mediate natural variation in males compared to those affecting hermaphrodite heterotaxy (Chapter 4). 95% confidence intervals and linkage disequilibrium analysis indicate that these QTL for both traits are relatively large (compared to the length of the chromosome (Chapters 3 and 4)), and near-isogenic lines (NILs) excluded a small ~1.5 mb region on chromosome II. We applied variant-effect prediction (VEP) analysis of the remaining QTL on chromosomes II, III and X as an alternative to identified possible candidates (Chapter 3).
In parallel to the standard approaches, we have also applied a machine learning pipeline using ElasticNet Regression as a complementary approach for identifying genetic loci that may be missed as a result of small effect size, population structure, or low statistical power for naturally varying traits in C. elegans isotypes (Chapter 5). Our approach was similar to its use in multi-parental inbred mice lines but was combined with, and enhanced by, the genetic toolkit of C. elegans. For example, C. elegans isotypes can be used to screen a large amount of candidate genes for functional studies in polygenic backgrounds. We used the previously described, naturally varying requirement for SKN-1 – a maternally-loaded transcription factor necessary for endoderm specification - to test the performance of ElasticNet regression in identifying biologically confirmed regions from other standard approaches like QTL mapping and GWAS. ElasticNet regression was able to identify all loci identified by GWAS and QTL mapping with a set of recombinant inbred lines made with isotypes N2 and MY16 (Spearman’s rho, R2 = 0.55, p = 6.5x10-5), in addition to novel loci on chromosomes V and X. While this model is significant statistically, isogenic line testing will be necessary to validate this method as a valid alternative to more traditional association mapping methods (Chapter 5).