USING A NOVEL STATISTIC AND COALESCENT THEORY TO UNDERSTAND GENE FLOW AND SPATIAL STRUCTURE OF POPULATIONS THROUGH GENOMIC DATA
- Lopez Fang, Lesly
- Advisor(s): McTavish, Emily Jane
Abstract
Understanding the evolutionary past using genetic data requires the development and application of statistical tools and coalescent models that accurately capture the underlying evolutionary processes. Migration and dispersal are important evolutionary forces which leave signatures in the genome, but the signal of these processes can be challenging to detect and differentiate from other factors. By improving our detection of gene flow and spatial structure, we can reveal regions where gene flow from one population into another may have spurred adaptation and infer the spatial history of populations.
I developed a statistic, D+, to detect local genomic regions of introgression. This statistic is based on the ABBA-BABA statistic, called D, that detects introgression genome-wide using site patterns that are inconsistent with the species tree. This novel statistic incorporates the signal of shared ancestral sites to improve power to detect signal in smaller genomic regions. We evaluated the precision and recall of D+ using simulations, and found that it compared favorably to existing approaches. D+ was able to accurately detect signals of introgression in simulated data and results using D+ were consistent with prior inferences on empirical data.
I applied this new statistic to genomic data from African cattle. I detected likely regions of introgression from Bos indicus into the Bos taurus African cattle breed, N’dama, and located candidate genes in those regions which have been previously identified as targets of selection.
To investigate the effects of spatial structure on coalescent times in a population, we developed a coalescent model, based on the Wright Fisher model, that incorporated limited dispersal. We simulate a genetic dataset based on the coalescent trees from our limited dispersal coalescent model to estimate relative coalescent times within the population. We find that although limited dispersal affects local coalescent times, limited dispersal does not affect coalescent times averaged across the population. While summaries of coalescence times can disguise signal of isolation by distance, PCA consistently revealed the underlying population structure.Together this work provides new tools to accurately understand gene flow and spatial structure of populations.