Search

Article
Peer Reviewed

False Negatives Are a Significant Feature of Next Generation Sequencing Callsets

UC Davis Previously Published Works (2016)

Short-read, next-generation sequencing (NGS) is now broadly used to identify rare or de novo mutations in population samples and disease cohorts. However, NGS data is known to be error-prone and post-processing pipelines have primarily focused on the removal of spurious mutations or “false positives” for downstream genome datasets. Less attention has been paid to characterizing the fraction of missing mutations or “false negatives” (FN). Here we interrogate several publically available human NGS autosomal variant datasets using corresponding Sanger sequencing as a truth-set. We examine both low-coverage Illumina and high-coverage Complete Genomics genomes. We show that the FN rate varies between 3%-18% and that false-positive rates are considerably lower (<3%) for publically available human genome callsets like 1000 Genomes. The FN rate is strongly dependent on calling pipeline parameters, as well as read coverage. Our results demonstrate that missing mutations are a significant feature of genomic datasets and imply additional fine-tuning of bioinformatics pipelines is needed. To address this, we design a phylogeny-aware tool [PhyloFaN] which can be used to quantify the FN rate for haploid genomic experiments, without additional generation of validation data. Using PhyloFaN on ultra-high coverage NGS data from both Illumina HiSeq and Complete Genomics platforms derived from the 1000 Genomes Project, we characterize the false negative rate in human mtDNA genomes. The false negative rate for the publically available mtDNA callsets is 17-20%, even for extremely high coverage haploid data.

Cover page: False Negatives Are a Significant Feature of Next Generation Sequencing Callsets

Article
Peer Reviewed

Conserved regulatory motifs at phenylethanolamine N-methyltransferase (PNMT) are disrupted by common functional genetic variation: an integrated computational/experimental approach

UC San Diego Previously Published Works (2010)

The adrenomedullary hormone epinephrine transduces environmental stressors into cardiovascular events (tachycardia and hypertension). Although the epinephrine biosynthetic enzyme PNMT genetic locus displays both linkage and association to such traits, genetic variation underlying these quantitative phenotypes is not established. Using an integrated suite of computational and experimental approaches, we elucidate a functional mechanism for common (minor allele frequencies > 30%) genetic variants at PNMT. Transcription factor binding motif prediction on mammalian PNMT promoter alignments identified two variant regulatory motifs, SP1 and EGR1, disrupted by G-367A (rs3764351), and SOX17 motif created by G-161A (rs876493). Electrophoretic mobility shifts of approximately 30-bp oligonucleotides containing ancestral versus variant alleles validated the computational hypothesis. Queried against chromaffin cell nuclear protein extracts, only the G-367 and -161A alleles shifted. Specific antibodies applied in electrophoretic gel shift experiments confirmed binding of SP1 and EGR1 to G-367 and SOX17 to -161A. The in vitro allele-specific binding was verified in cella through promoter reporter assays: lower activity for -367A haplotypes cotransfected by SP1 (p = 0.002) and EGR1 (p = 0.034); and enhanced inhibition of -161A haplotypes (p = 0.0003) cotransfected with SP1 + SOX17. Finally, we probed cis/trans regulation with endogenous factors by chromatin immunoprecipitation using SP1/EGR1/SOX17 antibodies. We describe the systematic application of complementary computational and experimental techniques to detect and document functional genetic variation in a trait-associated regulatory region. The results provide insight into cis and trans transcriptional mechanisms whereby common variation at PNMT can give rise to quantitative changes in human physiological and disease traits. Thus, PNMT variants in cis may interact with nuclear factors in trans to govern adrenergic activity.

Cover page: Conserved regulatory motifs at phenylethanolamine N-methyltransferase (PNMT) are disrupted by common functional genetic variation: an integrated computational/experimental approach

Article
Peer Reviewed

Rare Synaptogenesis-Impairing Mutations in SLITRK5 Are Associated with Obsessive Compulsive Disorder

UC San Francisco Previously Published Works (2017)

Obsessive compulsive disorder (OCD) is substantially heritable, but few molecular genetic risk factors have been identified. Knockout mice lacking SLIT and NTRK-Like Family, Member 5 (SLITRK5) display OCD-like phenotypes including serotonin reuptake inhibitor-sensitive pathologic grooming, and corticostriatal dysfunction. Thus, mutations that impair SLITRK5 function may contribute to the genetic risk for OCD. We re-sequenced the protein-coding sequence of the human SLITRK5 gene (SLITRK5) in three hundred and seventy seven OCD subjects and compared rare non-synonymous mutations (RNMs) in that sample with similar mutations in the 1000 Genomes database. We also performed in silico assessments and in vitro functional synaptogenesis assays on the Slitrk5 mutations identified. We identified four RNM's among these OCD subjects. There were no significant differences in the prevalence or in silico effects of rare non-synonymous mutations in the OCD sample versus controls. Direct functional testing of recombinant SLITRK5 proteins found that all mutations identified in OCD subjects impaired synaptogenic activity whereas none of the pseudo-matched mutations identified in 1000 Genomes controls had significant effects on SLITRK5 function (Fisher's exact test P = 0.028). These results demonstrate that rare functional mutations in SLITRK5 contribute to the genetic risk for OCD in human populations. They also highlight the importance of biological characterization of allelic effects in understanding genotype-phenotype relationships as there were no statistical differences in overall prevalence or bioinformatically predicted effects of OCD case versus control mutations. Finally, these results converge with others to highlight the role of aberrant synaptic function in corticostriatal neurons in the pathophysiology of OCD.

Cover page: Rare Synaptogenesis-Impairing Mutations in SLITRK5 Are Associated with Obsessive Compulsive Disorder

Article
Peer Reviewed

Genomes of Three Closely Related Caribbean Amazons Provide Insight for Species History and Conservation

UC Davis Previously Published Works (2019)

Islands have been used as model systems for studies of speciation and extinction since Darwin published his observations about finches found on the Galapagos. Amazon parrots inhabiting the Greater Antillean Islands represent a fascinating model of species diversification. Unfortunately, many of these birds are threatened as a result of human activity and some, like the Puerto Rican parrot, are now critically endangered. In this study we used a combination of de novo and reference-assisted assembly methods, integrating it with information obtained from related genomes to perform genome reconstruction of three amazon species. First, we used whole genome sequencing data to generate a new de novo genome assembly for the Puerto Rican parrot (Amazona vittata). We then improved the obtained assembly using transcriptome data from Amazona ventralis and used the resulting sequences as a reference to assemble the genomes Hispaniolan (A. ventralis) and Cuban (Amazona leucocephala) parrots. Finally, we, annotated genes and repetitive elements, estimated genome sizes and current levels of heterozygosity, built models of demographic history and provided interpretation of our findings in the context of parrot evolution in the Caribbean.

Cover page: Genomes of Three Closely Related Caribbean Amazons Provide Insight for Species History and Conservation