Skip to main content
eScholarship
Open Access Publications from the University of California
Cover page of A compendium of human gene functions derived from evolutionary modelling

A compendium of human gene functions derived from evolutionary modelling

(2025)

A comprehensive, computable representation of the functional repertoire of all macromolecules encoded within the human genome is a foundational resource for biology and biomedical research. The Gene Ontology Consortium has been working towards this goal by generating a structured body of information about gene functions, which now includes experimental findings reported in more than 175,000 publications for human genes and genes in experimentally tractable model organisms1,2. Here, we describe the results of a large, international effort to integrate all of these findings to create a representation of human gene functions that is as complete and accurate as possible. Specifically, we apply an expert-curated, explicit evolutionary modelling approach to all human protein-coding genes. This approach integrates available experimental information across families of related genes into models that reconstruct the gain and loss of functional characteristics over evolutionary time. The models and the resulting set of 68,667 integrated gene functions cover approximately 82% of human protein-coding genes. The functional repertoire reveals a marked preponderance of molecular regulatory functions, and the models provide insights into the evolutionary origins of human gene functions. We show that our set of descriptions of functions can improve the widely used genomic technique of Gene Ontology enrichment analysis. The experimental evidence for each functional characteristic is recorded, thereby enabling the scientific community to help review and improve the resource, which we have made publicly available.

Cover page of The tier system: a host development framework for bioengineering

The tier system: a host development framework for bioengineering

(2025)

Development of microorganisms into mature bioproduction host strains has typically been a slow and circuitous process, wherein multiple groups apply disparate approaches with minimal coordination over decades. To help organize and streamline host development efforts, we introduce the Tier System for Host Development, a conceptual model and guide for developing microbial hosts that can ultimately lead to a systematic, standardized, less expensive, and more rapid workflow. The Tier System is made up of three Tiers, each consisting of a unique set of strain development Targets, including experimental tools, strain properties, experimental information, and process models. By introducing the Tier System, we hope to improve host development activities through standardization and systematization pertaining to nontraditional chassis organisms.

Cover page of Chaotrope-Based Approach for Rapid In Vitro Assembly and Loading of Bacterial Microcompartment Shells

Chaotrope-Based Approach for Rapid In Vitro Assembly and Loading of Bacterial Microcompartment Shells

(2025)

Bacterial microcompartments (BMCs) are proteinaceous organelles that self-assemble into selectively permeable shells that encapsulate enzymatic cargo. BMCs enhance catalytic pathways by reducing crosstalk among metabolites, preventing harmful intermediates from leaking into the cytosol and increasing reaction efficiency via enzyme colocalization. The intrinsic properties of BMCs make them attractive for biotechnological engineering. However, in vivo expression methods for shell synthesis have significant drawbacks that limit the potential design space for these nanocompartments. Here, we describe the development of an efficient and rapid method for the in vitro assembly of BMC shells from their protein building blocks. Our method enables large-scale construction of BMC shells by utilizing urea as a chaotropic agent to control self-assembly and provides an approach for encapsulation of both biotic and abiotic cargo under a broad range of reaction conditions. We demonstrate an enhanced level of control over the assembly of BMC shells in vitro and expand the design parameter space for engineering BMC systems with specialized and enhanced catalytic properties.

Cover page of In planta production of the nylon precursor beta-ketoadipate

In planta production of the nylon precursor beta-ketoadipate

(2025)

Beta-ketoadipate (βKA) is an intermediate of the βKA pathway involved in the degradation of aromatic compounds in several bacteria and fungi. Beta-ketoadipate also represents a promising chemical for the manufacturing of performance-advantaged nylons. We established a strategy for the in planta synthesis of βKA via manipulation of the shikimate pathway and the expression of bacterial enzymes from the βKA pathway. Using Nicotiana benthamiana as a transient expression system, we demonstrated the efficient conversion of protocatechuate (PCA) to βKA when plastid-targeted bacterial-derived PCA 3,4-dioxygenase (PcaHG) and 3-carboxy-cis,cis-muconate cycloisomerase (PcaB) were co-expressed with 3-deoxy-D-arabinoheptulosonate 7-phosphate synthase (AroG) and 3-dehydroshikimate dehydratase (QsuB). This metabolic pathway was reconstituted in Arabidopsis by introducing a construct (pAtβKA) with stacked pcaG, pcaH, and pcaB genes into a PCA-overproducing genetic background that expresses AroG and QsuB (referred as QsuB-2). The resulting QsuB-2 x pAtβKA stable lines displayed βKA titers as high as 0.25% on a dry weight basis in stems, along with a drastic reduction in lignin content and improvement of biomass saccharification efficiency compared to wild-type controls, and without any significant reduction in biomass yields. Using biomass sorghum as a potential crop for large-scale βKA production, techno-economic analysis indicated that βKA accumulated at titers of 0.25% and 4% on a dry weight basis could be competitively priced in the range of $2.04-34.49/kg and $0.47-2.12/kg, respectively, depending on the selling price of the residual biomass recovered after βKA extraction. This study lays the foundation for a more environmentally-friendly synthesis of βKA using plants as production hosts.

Cover page of Towards a standard benchmark for phenotype-driven variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework.

Towards a standard benchmark for phenotype-driven variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework.

(2025)

BACKGROUND: Computational approaches to support rare disease diagnosis are challenging to build, requiring the integration of complex data types such as ontologies, gene-to-phenotype associations, and cross-species data into variant and gene prioritisation algorithms (VGPAs). However, the performance of VGPAs has been difficult to measure and is impacted by many factors, for example, ontology structure, annotation completeness or changes to the underlying algorithm. Assertions of the capabilities of VGPAs are often not reproducible, in part because there is no standardised, empirical framework and openly available patient data to assess the efficacy of VGPAs-ultimately hindering the development of effective prioritisation tools. RESULTS: In this paper, we present our benchmarking tool, PhEval, which aims to provide a standardised and empirical framework to evaluate phenotype-driven VGPAs. The inclusion of standardised test corpora and test corpus generation tools in the PhEval suite of tools allows open benchmarking and comparison of methods on standardised data sets. CONCLUSIONS: PhEval and the standardised test corpora solve the issues of patient data availability and experimental tooling configuration when benchmarking and comparing rare disease VGPAs. By providing standardised data on patient cohorts from real-world case-reports and controlling the configuration of evaluated VGPAs, PhEval enables transparent, portable, comparable and reproducible benchmarking of VGPAs. As these tools are often a key component of many rare disease diagnostic pipelines, a thorough and standardised method of assessment is essential for improving patient diagnosis and care.

Cover page of Quantifying the impact of workshops promoting microbiome data standards and data stewardship.

Quantifying the impact of workshops promoting microbiome data standards and data stewardship.

(2025)

The field of microbiome research continues to grow at a rapid pace, with multi-omics approaches becoming widely used to interrogate diverse microbiome samples. However, due to lagging awareness and implementation of standards and data stewardship, many datasets are produced that are not comparable, reproducible, or reusable. In 2021, the National Microbiome Data Collaborative launched its Ambassador Program, which utilizes a community-learning model to annually train a cohort of early-career researchers in microbiome data stewardship best practices. These Ambassadors then host workshops and other events to communicate these themes to their respective microbiome research communities. To quantify the impact of this learning model for promoting awareness of and experience with microbiome data, we conducted a survey of workshop participants from events hosted by the 2023 Ambassador cohort. The 2023 cohort of 13 National Microbiome Data Collaborative Ambassadors collectively hosted 21 events, reaching over 550 researchers. The Ambassadors distributed an anonymous post-workshop survey to their event participants to quantify the effectiveness of the training materials, the workshop format, and the thematic content. From the 21 events, survey results were successfully collected for 15 of those events from a total of 122 researchers. Overall, 122 participants working with a range of microbiome types and from a variety of institutions responded to the survey and reported overwhelmingly positive experiences with the workshop content and materials, with 98% of respondents reporting that they gained knowledge from the event. Participants across the events also reported an increase in their post-workshop understanding of metadata standards, principles for microbiome data management and reporting, and the importance of standardization in microbiome data processing. Participants also expressed a willingness to apply what they learned about microbiome data stewardship to their own research. The results of this study demonstrate the effectiveness of hands-on workshops and community-learning for communicating data stewardship best practices to microbiome researchers. The lessons learned and details about the implementation of this cohort-based learning model contained herein are intended to assist other groups in their efforts to create or improve similar learning strategies.

Cover page of Virocell Necromass Provides Limited Plant Nitrogen and Elicits Rhizosphere Metabolites That Affect Phage Dynamics.

Virocell Necromass Provides Limited Plant Nitrogen and Elicits Rhizosphere Metabolites That Affect Phage Dynamics.

(2025)

Bacteriophages impact soil bacteria through lysis, altering the availability of organic carbon and plant nutrients. However, the magnitude of nutrient uptake by plants from lysed bacteria remains unknown, partly because this process is challenging to investigate in the field. In this study, we extend ecosystem fabrication (EcoFAB 2.0) approaches to study plant-bacteria-phage interactions by comparing the impact of virocell (phage-lysed) and uninfected 15N-labelled bacterial necromass on plant nitrogen acquisition and rhizosphere exometabolites composition. We show that grass Brachypodium distachyon derives some nitrogen from amino acids in uninfected Pseudomonas putida necromass lysed by sonication but not from virocell necromass. Additionally, the bacterial necromass elicits the formation of rhizosphere exometabolites, some of which (guanosine), alongside tested aromatic acids (p-coumaric and benzoic acid), show bacterium-specific effects on bacteriophage-induced lysis when tested in vitro. The study highlights the dynamic feedback between virocell necromass and plants and suggests that root exudate metabolites can impact bacteriophage infection dynamics.

Cover page of The Unified Phenotype Ontology : a framework for cross-species integrative phenomics

The Unified Phenotype Ontology : a framework for cross-species integrative phenomics

(2025)

Phenotypic data are critical for understanding biological mechanisms and consequences of genomic variation, and are pivotal for clinical use cases such as disease diagnostics and treatment development. For over a century, vast quantities of phenotype data have been collected in many different contexts covering a variety of organisms. The emerging field of phenomics focuses on integrating and interpreting these data to inform biological hypotheses. A major impediment in phenomics is the wide range of distinct and disconnected approaches to recording the observable characteristics of an organism. Phenotype data are collected and curated using free text, single terms or combinations of terms, using multiple vocabularies, terminologies, or ontologies. Integrating these heterogeneous and often siloed data enables the application of biological knowledge both within and across species. Existing integration efforts are typically limited to mappings between pairs of terminologies; a generic knowledge representation that captures the full range of cross-species phenomics data is much needed. We have developed the Unified Phenotype Ontology (uPheno) framework, a community effort to provide an integration layer over domain-specific phenotype ontologies, as a single, unified, logical representation. uPheno comprises (1) a system for consistent computational definition of phenotype terms using ontology design patterns, maintained as a community library; (2) a hierarchical vocabulary of species-neutral phenotype terms under which their species-specific counterparts are grouped; and (3) mapping tables between species-specific ontologies. This harmonized representation supports use cases such as cross-species integration of genotype-phenotype associations from different organisms and cross-species informed variant prioritization.

Cover page of The Unified Phenotype Ontology : a framework for cross-species integrative phenomics

The Unified Phenotype Ontology : a framework for cross-species integrative phenomics

(2025)

Phenotypic data are critical for understanding biological mechanisms and consequences of genomic variation, and are pivotal for clinical use cases such as disease diagnostics and treatment development. For over a century, vast quantities of phenotype data have been collected in many different contexts covering a variety of organisms. The emerging field of phenomics focuses on integrating and interpreting these data to inform biological hypotheses. A major impediment in phenomics is the wide range of distinct and disconnected approaches to recording the observable characteristics of an organism. Phenotype data are collected and curated using free text, single terms or combinations of terms, using multiple vocabularies, terminologies, or ontologies. Integrating these heterogeneous and often siloed data enables the application of biological knowledge both within and across species. Existing integration efforts are typically limited to mappings between pairs of terminologies; a generic knowledge representation that captures the full range of cross-species phenomics data is much needed. We have developed the Unified Phenotype Ontology (uPheno) framework, a community effort to provide an integration layer over domain-specific phenotype ontologies, as a single, unified, logical representation. uPheno comprises (1) a system for consistent computational definition of phenotype terms using ontology design patterns, maintained as a community library; (2) a hierarchical vocabulary of species-neutral phenotype terms under which their species-specific counterparts are grouped; and (3) mapping tables between species-specific ontologies. This harmonized representation supports use cases such as cross-species integration of genotype-phenotype associations from different organisms and cross-species informed variant prioritization.

Cover page of Powdery mildew induces chloroplast storage lipid formation at the expense of host thylakoids to promote spore production.

Powdery mildew induces chloroplast storage lipid formation at the expense of host thylakoids to promote spore production.

(2025)

Powdery mildews are obligate biotrophic fungi that manipulate plant metabolism to supply lipids to the fungus, particularly during fungal asexual reproduction when lipid demand is high. We found levels of leaf storage lipids (triacylglycerols, TAGs) are 3.5-fold higher in whole Arabidopsis (Arabidopsis thaliana) leaves with a 15-fold increase in storage lipids at the infection site during fungal asexual reproduction. Lipid bodies, not observable in uninfected mature leaves, were found in and external to chloroplasts in mesophyll cells underlying the fungal feeding structure. Concomitantly, thylakoid disassembly occurred and thylakoid membrane lipid levels decreased. Genetic analyses showed that canonical endoplasmic reticulum TAG biosynthesis does not support powdery mildew spore production. Instead, Arabidopsis chloroplast-localized DIACYLGLYCEROL ACYLTRANSFERASE 3 (DGAT3) promoted fungal asexual reproduction. Consistent with the reported AtDGAT3 preference for 18:3 and 18:2 acyl substrates, which are dominant in thylakoid membrane lipids, dgat3 mutants exhibited a dramatic reduction in powdery mildew-induced chloroplast TAGs, attributable to decreases in TAG species largely comprised of 18:3 and 18:2 acyl substrates. This pathway for TAG biosynthesis in the chloroplast at the expense of thylakoids provides insights into obligate biotrophy and plant lipid metabolism, plasticity, and function. By understanding how photosynthetically active leaves can be converted into TAG producers, more sustainable and environmentally friendly plant oil production may be developed.