Search

Article
Peer Reviewed

MutPred Splice: machine learning-based prediction of exonic variants that disrupt splicing.

UC Santa Cruz Previously Published Works (2014)

We have developed a novel machine-learning approach, MutPred Splice, for the identification of coding region substitutions that disrupt pre-mRNA splicing. Applying MutPred Splice to human disease-causing exonic mutations suggests that 16% of mutations causing inherited disease and 10 to 14% of somatic mutations in cancer may disrupt pre-mRNA splicing. For inherited disease, the main mechanism responsible for the splicing defect is splice site loss, whereas for cancer the predominant mechanism of splicing disruption is predicted to be exon skipping via loss of exonic splicing enhancers or gain of exonic splicing silencer elements. MutPred Splice is available at http://mutdb.org/mutpredsplice.

Cover page: MutPred Splice: machine learning-based prediction of exonic variants that disrupt splicing.

Article
Peer Reviewed

Using domain knowledge for robust and generalizable deep learning-based CT-free PET attenuation and scatter correction.

UC Davis Previously Published Works (2022)

Despite the potential of deep learning (DL)-based methods in substituting CT-based PET attenuation and scatter correction for CT-free PET imaging, a critical bottleneck is their limited capability in handling large heterogeneity of tracers and scanners of PET imaging. This study employs a simple way to integrate domain knowledge in DL for CT-free PET imaging. In contrast to conventional direct DL methods, we simplify the complex problem by a domain decomposition so that the learning of anatomy-dependent attenuation correction can be achieved robustly in a low-frequency domain while the original anatomy-independent high-frequency texture can be preserved during the processing. Even with the training from one tracer on one scanner, the effectiveness and robustness of our proposed approach are confirmed in tests of various external imaging tracers on different scanners. The robust, generalizable, and transparent DL development may enhance the potential of clinical translation.

Cover page: Using domain knowledge for robust and generalizable deep learning-based CT-free PET attenuation and scatter correction.

Article
Peer Reviewed

A Probabilistic Model to Predict Clinical Phenotypic Traits from Genome Sequencing

UC San Diego Previously Published Works (2014)

Genetic screening is becoming possible on an unprecedented scale. However, its utility remains controversial. Although most variant genotypes cannot be easily interpreted, many individuals nevertheless attempt to interpret their genetic information. Initiatives such as the Personal Genome Project (PGP) and Illumina's Understand Your Genome are sequencing thousands of adults, collecting phenotypic information and developing computational pipelines to identify the most important variant genotypes harbored by each individual. These pipelines consider database and allele frequency annotations and bioinformatics classifications. We propose that the next step will be to integrate these different sources of information to estimate the probability that a given individual has specific phenotypes of clinical interest. To this end, we have designed a Bayesian probabilistic model to predict the probability of dichotomous phenotypes. When applied to a cohort from PGP, predictions of Gilbert syndrome, Graves' disease, non-Hodgkin lymphoma, and various blood groups were accurate, as individuals manifesting the phenotype in question exhibited the highest, or among the highest, predicted probabilities. Thirty-eight PGP phenotypes (26%) were predicted with area-under-the-ROC curve (AUC)>0.7, and 23 (15.8%) of these were statistically significant, based on permutation tests. Moreover, in a Critical Assessment of Genome Interpretation (CAGI) blinded prediction experiment, the models were used to match 77 PGP genomes to phenotypic profiles, generating the most accurate prediction of 16 submissions, according to an independent assessor. Although the models are currently insufficiently accurate for diagnostic utility, we expect their performance to improve with growth of publicly available genomics data and model refinement by domain experts.

Article
Peer Reviewed

Tau/MAPT disease-associated variant A152T alters tau function and toxicity via impaired retrograde axonal transport

UC San Francisco Previously Published Works (2019)

Mutations in the microtubule-associated protein tau (MAPT) underlie multiple neurodegenerative disorders, yet the pathophysiological mechanisms are unclear. A novel variant in MAPT resulting in an alanine to threonine substitution at position 152 (A152T tau) has recently been described as a significant risk factor for both frontotemporal lobar degeneration and Alzheimer's disease. Here we use complementary computational, biochemical, molecular, genetic and imaging approaches in Caenorhabditis elegans and mouse models to interrogate the effects of the A152T variant on tau function. In silico analysis suggests that a threonine at position 152 of tau confers a new phosphorylation site. This finding is borne out by mass spectrometric survey of A152T tau phosphorylation in C. elegans and mouse. Optical pulse-chase experiments of Dendra2-tau demonstrate that A152T tau and phosphomimetic A152E tau exhibit increased diffusion kinetics and the ability to traverse across the axon initial segment more efficiently than wild-type (WT) tau. A C. elegans model of tauopathy reveals that A152T and A152E tau confer patterns of developmental toxicity distinct from WT tau, likely due to differential effects on retrograde axonal transport. These data support a role for phosphorylation of the variant threonine in A152T tau toxicity and suggest a mechanism involving impaired retrograde axonal transport contributing to human neurodegenerative disease.

Cover page: Tau/MAPT disease-associated variant A152T alters tau function and toxicity via impaired retrograde axonal transport

Article
Peer Reviewed

SIRT5 Regulates the Mitochondrial Lysine Succinylome and Metabolic Networks

UC San Francisco Previously Published Works (2013)

Reversible posttranslational modifications are emerging as critical regulators of mitochondrial proteins and metabolism. Here, we use a label-free quantitative proteomic approach to characterize the lysine succinylome in liver mitochondria and its regulation by the desuccinylase SIRT5. A total of 1,190 unique sites were identified as succinylated, and 386 sites across 140 proteins representing several metabolic pathways including β-oxidation and ketogenesis were significantly hypersuccinylated in Sirt5(-/-) animals. Loss of SIRT5 leads to accumulation of medium- and long-chain acylcarnitines and decreased β-hydroxybutyrate production in vivo. In addition, we demonstrate that SIRT5 regulates succinylation of the rate-limiting ketogenic enzyme 3-hydroxy-3-methylglutaryl-CoA synthase 2 (HMGCS2) both in vivo and in vitro. Finally, mutation of hypersuccinylated residues K83 and K310 on HMGCS2 to glutamic acid strongly inhibits enzymatic activity. Taken together, these findings establish SIRT5 as a global regulator of lysine succinylation in mitochondria and present a mechanism for inhibition of ketogenesis through HMGCS2.

Cover page: SIRT5 Regulates the Mitochondrial Lysine Succinylome and Metabolic Networks

Article
Peer Reviewed

Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges

UC San Diego Previously Published Works (2017)

The advent of next-generation sequencing has dramatically decreased the cost for whole-genome sequencing and increased the viability for its application in research and clinical care. The Personal Genome Project (PGP) provides unrestricted access to genomes of individuals and their associated phenotypes. This resource enabled the Critical Assessment of Genome Interpretation (CAGI) to create a community challenge to assess the bioinformatics community's ability to predict traits from whole genomes. In the CAGI PGP challenge, researchers were asked to predict whether an individual had a particular trait or profile based on their whole genome. Several approaches were used to assess submissions, including ROC AUC (area under receiver operating characteristic curve), probability rankings, the number of correct predictions, and statistical significance simulations. Overall, we found that prediction of individual traits is difficult, relying on a strong knowledge of trait frequency within the general population, whereas matching genomes to trait profiles relies heavily upon a small number of common traits including ancestry, blood type, and eye color. When a rare genetic disorder is present, profiles can be matched when one or more pathogenic variants are identified. Prediction accuracy has improved substantially over the last 6 years due to improved methodology and a better understanding of features.

Cover page: Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges

Article
Peer Reviewed

Rare variant associations with waist-to-hip ratio in European-American and African-American women from the NHLBI-Exome Sequencing Project

UCLA Previously Published Works (2016)

Waist-to-hip ratio (WHR), a relative comparison of waist and hip circumferences, is an easily accessible measurement of body fat distribution, in particular central abdominal fat. A high WHR indicates more intra-abdominal fat deposition and is an established risk factor for cardiovascular disease and type 2 diabetes. Recent genome-wide association studies have identified numerous common genetic loci influencing WHR, but the contributions of rare variants have not been previously reported. We investigated rare variant associations with WHR in 1510 European-American and 1186 African-American women from the National Heart, Lung, and Blood Institute-Exome Sequencing Project. Association analysis was performed on the gene level using several rare variant association methods. The strongest association was observed for rare variants in IKBKB (P=4.0 × 10(-8)) in European-Americans, where rare variants in this gene are predicted to decrease WHRs. The activation of the IKBKB gene is involved in inflammatory processes and insulin resistance, which may affect normal food intake and body weight and shape. Meanwhile, aggregation of rare variants in COBLL1, previously found to harbor common variants associated with WHR and fasting insulin, were nominally associated (P=2.23 × 10(-4)) with higher WHR in European-Americans. However, these significant results are not shared between African-Americans and European-Americans that may be due to differences in the allelic architecture of the two populations and the small sample sizes. Our study indicates that the combined effect of rare variants contribute to the inter-individual variation in fat distribution through the regulation of insulin response.

Cover page: Rare variant associations with waist-to-hip ratio in European-American and African-American women from the NHLBI-Exome Sequencing Project

Article
Peer Reviewed

An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Joint Genome Institute (2016)

Background

A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.

Results

We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.

Conclusions

The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.

Cover page: An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Article
Peer Reviewed

CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods

UC Santa Cruz Previously Published Works (2024)

Background

The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors.

Results

Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic.

Conclusions

Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.

Cover page: CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods