Toward Understanding Nonlinear Human Genetic Architectures at Scale
- Fu, Boyang
- Advisor(s): Sankararaman, Sriram
Abstract
Advancements in genome sequencing technologies, together with technologies measuring other modalities, have been redefining the field of human genetics. With the increased availability of data and computational resources, researchers are now able to capture complex genetic interactions that were once difficult or impossible to measure.
In the past decade, growing discoveries in genome-wide association studies (GWAS) have been significantly driven by the increasing availability of high-quality sequencing data and the rapid expansion of datasets. Most existing works focused on modeling additive genetic effects while largely overlooking the contribution of more complex genetic interactive effects, also known as epistasis, on outcome traits. Such a phenomenon is partially influenced by earlier theoretical work, challenges in effective modeling under extremely high-dimensional space, and the limited measurement available then.
While an increasing body of literature suggests that additive models explain most heritability at the population level in human genetics, understanding genetic interactions holds significant potential to bridge the gap between statistical genetics and underlying biological mechanisms. This understanding can enhance the development of precision medicine and personalized health, ultimately linking genetic research to individual-level applications.
The first challenge in understanding genetic nonlinearity lies in the high-dimensional nature of the data, where the largest sample size is often no greater than the number of sequenced genetic features. Consequently, it becomes necessary to constrain the search space and focus on studying specific types of interactions.
In this thesis, I present three research projects surrounding this topic. The first project investigates efficient modeling of genetic quadratic interactions within a localized context. Building on this, the second project explores higher-order interactive relationships. Finally, the third project broadens the scope of interactive features, modeling interactions between a target variant and other variants across the whole genome scale.