Recent years have witnessed a surge in interest in statistical genomics and healthcare. This surge stems from the increasing availability of genomic databases and Biobank-scale genetic data, as well as the accessibility of deep neural networks that mine insights from them. Taken together, these trends enable, in theory, the discovery of biological pathways connecting genetic mutations to complex traits. Such discoveries, in turn, promise to open up avenues for healthcare applications, including the development of personalized clinical instruments.
Yet, in practice, many challenges remain in realizing this promise. One important challenge is in assessing if a biological signal detected from analyzing the data is generalizable. It is recognized that genomic and phenotypic predictions made from classical and recent models do not work well across individuals. To give a concrete example, despite strong advocacy for the use of polygenic risk scores --- summaries of information across an individual's genome --- in clinical settings, they still port poorly across populations and individuals, performing well primarily on individuals of European descent. Poor generalizability can be attributed to many factors, including differences in biological effects across cohorts despite similar pathways, and biased inferences resulting from violation of assumptions or overfitting to data.
This dissertation attempts to bridge theory and practice, by critically evaluating fundamental assumptions required for unbiased inference from genomic data, and proposing strategies to ensure that such inferences are valid. The dissertation is divided into three parts. First, a method to assess exchangeability is developed, which can be used to interrogate a fundamental assumption underlying core analyses in population genetics such as demographic inference and linkage disequilibrium (LD) block estimation. Second, a study that operationalizes the stability principle is performed to show that stable discoveries adjuvate the detection of variants of functional impact. Finally, a framework for sensitivity analyses of PRSs is proposed, which not only is inspired by mathematical results relating population structure to inflated PRS performance, but also allows validation of variant effect trustworthiness. Taken together, these studies provide insights and tools for ensuring that the scientific conclusions drawn from statistical analyses of genomic data are more likely generalizable.