Skip to main content
eScholarship
Open Access Publications from the University of California

UC Davis

UC Davis Electronic Theses and Dissertations bannerUC Davis

Bayesian Phylogenetic Inference using Dated DNA Samples with Applications to HIV Latency and Ancient DNA

Abstract

A longstanding goal in phylogenetics is to estimate when species, populations, or individuals (in the case of viruses) diverged from a common ancestor. In molecular phylogenetics, which uses sequence data to estimate the topology and branch lengths of phylogenies, the rate of molecular evolution and time are confounded. Only the product of rate and time is identifiable without outside information.Several methods have been used to disentangle rate and time, including fossil calibrations and dated samples (known as tip dating). Tip dating exploits the difference in branch lengths between samples of different known ages to both estimate the rate of molecular evolution and to time calibrate a phylogeny assuming a molecular clock.This dissertation focuses on the development and application of novel tip dating methods to estimate time calibrated phylogenies.

Chapter 1 provides an introduction to relevant areas of phylogenetics. Types of studies that utilize time calibrated phylogenies are described. Then, a brief introduction into Bayesian phylogenetics and MCMC methods is provided. Lastly, phylogenetic time calibration methods are outlined. Chapter 2 applies tip dating to investigate the temporal dynamics of latency in HIV. Effective antiretroviral therapy (ART) for HIV stops HIV from infecting new cells and most infected cells die shortly after infection, leaving HIV clinically undetectable within a patient. However, a small pool of long-lived cells, known as latently infected cells, can persist with the HIV integrated into their genomes for decades. If ART is stopped, these latent cells rapidly repopulate the patient with HIV and lead to disease progression.Due to its clinical relevance, researchers are interested in understanding characteristics of the latent reservoir, such as when individual cells in the reservoir became latent. Because HIV evolves rapidly within hosts, phylogenetic tip dating methods can be used to estimate time calibrated trees for within-host viral datasets. However, viral lineages from latent cells have a much lower mutation rate in comparison to lineages from non-latent sequences. In phylogenies with both latent and non-latent HIV sequences, this difference in mutation rate leads to shorter branch lengths for latent sequences than would be expected given the sampling times. A novel Bayesian tip dating method is developed that estimates when individual latent lineages became latent using this difference in branch lengths. A method to combine inferences across different regions of the HIV genome is also developed, which accounts for the fact that regions may differ in topology due to recombination. Combined inference greatly improves the accuracy of inferences when using only a few short sequences. The new methods perform better than many alternative heuristic methods and allow for biologically reasonable bounds on inferences, such as enforcing the latency times to be older than the sampling times. Lastly, the empirical utility of the method is demonstrated by analyzing two clinical datasets of patients with HIV.

Chapter 3 develops a method to analyze ancient DNA (aDNA) under the multispecies coalescent (MSC). With the increasing abundance of aDNA sequences, molecular data are now available to investigate the relationships between extinct and extant species, as well as between ancient samples of extant species and their modern relatives. These studies typically treat gene trees as species trees, which can lead to biases in inferred divergence times. The MSC overcomes these issues by explicitly modeling the relationship between gene trees and species trees. However, there are currently no methods that allow for tip dating with multiple sample dates within a species with the MSC; failing to account for sampling dates can also bias divergence time estimation. A method is developed to analyze aDNA under a MSC model, allowing for the inference of divergence times (in time units of both expected number of mutations and calendar time), effective population sizes, and mutation rate using large multilocus datasets with multiple individuals sampled in each population. Simulation studies suggest the new method can estimate the parameters accurately and precisely if the model assumptions are met. It is shown that treating ancient samples as contemporary (mimicking empirical practices) can lead to biases in estimates of divergence times and effective population sizes. Finally, two datasets with extant elephant species and woolly mammoths are analyzed. A strong signal of aDNA degradation was detected in one of the datasets, which likely biased estimates of mutation rate and divergence times. This suggests the need for more careful consideration of the impacts of DNA degradation on downstream analyses.

This dissertation demonstrates the wide utility and diverse potential applications of Bayesian tip dating methods, and provides powerful new methods to analyze empirical datasets on HIV latency and aDNA.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View