Skip to main content
eScholarship
Open Access Publications from the University of California
Cover page of The tier system: a host development framework for bioengineering

The tier system: a host development framework for bioengineering

(2025)

Development of microorganisms into mature bioproduction host strains has typically been a slow and circuitous process, wherein multiple groups apply disparate approaches with minimal coordination over decades. To help organize and streamline host development efforts, we introduce the Tier System for Host Development, a conceptual model and guide for developing microbial hosts that can ultimately lead to a systematic, standardized, less expensive, and more rapid workflow. The Tier System is made up of three Tiers, each consisting of a unique set of strain development Targets, including experimental tools, strain properties, experimental information, and process models. By introducing the Tier System, we hope to improve host development activities through standardization and systematization pertaining to nontraditional chassis organisms.

Cover page of The Unified Phenotype Ontology : a framework for cross-species integrative phenomics.

The Unified Phenotype Ontology : a framework for cross-species integrative phenomics.

(2025)

Phenotypic data are critical for understanding biological mechanisms and consequences of genomic variation, and are pivotal for clinical use cases such as disease diagnostics and treatment development. For over a century, vast quantities of phenotype data have been collected in many different contexts covering a variety of organisms. The emerging field of phenomics focuses on integrating and interpreting these data to inform biological hypotheses. A major impediment in phenomics is the wide range of distinct and disconnected approaches to recording the observable characteristics of an organism. Phenotype data are collected and curated using free text, single terms or combinations of terms, using multiple vocabularies, terminologies, or ontologies. Integrating these heterogeneous and often siloed data enables the application of biological knowledge both within and across species. Existing integration efforts are typically limited to mappings between pairs of terminologies; a generic knowledge representation that captures the full range of cross-species phenomics data is much needed. We have developed the Unified Phenotype Ontology (uPheno) framework, a community effort to provide an integration layer over domain-specific phenotype ontologies, as a single, unified, logical representation. uPheno comprises (1) a system for consistent computational definition of phenotype terms using ontology design patterns, maintained as a community library; (2) a hierarchical vocabulary of species-neutral phenotype terms under which their species-specific counterparts are grouped; and (3) mapping tables between species-specific ontologies. This harmonized representation supports use cases such as cross-species integration of genotype-phenotype associations from different organisms and cross-species informed variant prioritization.

Cover page of CRISPRi-ART enables functional genomics of diverse bacteriophages using RNA-binding dCas13d

CRISPRi-ART enables functional genomics of diverse bacteriophages using RNA-binding dCas13d

(2025)

Bacteriophages constitute one of the largest reservoirs of genes of unknown function in the biosphere. Even in well-characterized phages, the functions of most genes remain unknown. Experimental approaches to study phage gene fitness and function at genome scale are lacking, partly because phages subvert many modern functional genomics tools. Here we leverage RNA-targeting dCas13d to selectively interfere with protein translation and to measure phage gene fitness at a transcriptome-wide scale. We find CRISPR Interference through Antisense RNA-Targeting (CRISPRi-ART) to be effective across phage phylogeny, from model ssRNA, ssDNA and dsDNA phages to nucleus-forming jumbo phages. Using CRISPRi-ART, we determine a conserved role of diverse rII homologues in subverting phage Lambda RexAB-mediated immunity to superinfection and identify genes critical for phage fitness. CRISPRi-ART establishes a broad-spectrum phage functional genomics platform, revealing more than 90 previously unknown genes important for phage fitness.

Consistent Performance of GPT-4o in Rare Disease Diagnosis Across Nine Languages and 4967 Cases

(2025)

BACKGROUND: Large language models (LLMs) are increasingly used in the medical field for diverse applications including differential diagnostic support. The estimated training data used to create LLMs such as the Generative Pretrained Transformer (GPT) predominantly consist of English-language texts, but LLMs could be used across the globe to support diagnostics if language barriers could be overcome. Initial pilot studies on the utility of LLMs for differential diagnosis in languages other than English have shown promise, but a large-scale assessment on the relative performance of these models in a variety of European and non-European languages on a comprehensive corpus of challenging rare-disease cases is lacking. METHODS: We created 4967 clinical vignettes using structured data captured with Human Phenotype Ontology (HPO) terms with the Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema. These clinical vignettes span a total of 378 distinct genetic diseases with 2618 associated phenotypic features. We used translations of the Human Phenotype Ontology together with language-specific templates to generate prompts in English, Chinese, Czech, Dutch, German, Italian, Japanese, Spanish, and Turkish. We applied GPT-4o, version gpt-4o-2024-08-06, to the task of delivering a ranked differential diagnosis using a zero-shot prompt. An ontology-based approach with the Mondo disease ontology was used to map synonyms and to map disease subtypes to clinical diagnoses in order to automate evaluation of LLM responses. FINDINGS: For English, GPT-4o placed the correct diagnosis at the first rank 19·8% and within the top-3 ranks 27·0% of the time. In comparison, for the eight non-English languages tested here the correct diagnosis was placed at rank 1 between 16·9% and 20·5%, within top-3 between 25·3% and 27·7% of cases. INTERPRETATION: The differential diagnostic performance of GPT-4o across a comprehensive corpus of rare-disease cases was consistent across the nine languages tested. This suggests that LLMs such as GPT-4o may have utility in non-English clinical settings. FUNDING: NHGRI 5U24HG011449 and 5RM1HG010860. P.N.R. was supported by a Professorship of the Alexander von Humboldt Foundation; P.L. was supported by a National Grant (PMP21/00063 ONTOPREC-ISCIII, Fondos FEDER).

A compendium of human gene functions derived from evolutionary modelling

(2025)

A comprehensive, computable representation of the functional repertoire of all macromolecules encoded within the human genome is a foundational resource for biology and biomedical research. The Gene Ontology Consortium has been working towards this goal by generating a structured body of information about gene functions, which now includes experimental findings reported in more than 175,000 publications for human genes and genes in experimentally tractable model organisms1,2. Here, we describe the results of a large, international effort to integrate all of these findings to create a representation of human gene functions that is as complete and accurate as possible. Specifically, we apply an expert-curated, explicit evolutionary modelling approach to all human protein-coding genes. This approach integrates available experimental information across families of related genes into models that reconstruct the gain and loss of functional characteristics over evolutionary time. The models and the resulting set of 68,667 integrated gene functions cover approximately 82% of human protein-coding genes. The functional repertoire reveals a marked preponderance of molecular regulatory functions, and the models provide insights into the evolutionary origins of human gene functions. We show that our set of descriptions of functions can improve the widely used genomic technique of Gene Ontology enrichment analysis. The experimental evidence for each functional characteristic is recorded, thereby enabling the scientific community to help review and improve the resource, which we have made publicly available.

Cover page of Metabolic profiling of two white-rot fungi during 4-hydroxybenzoate conversion reveals biotechnologically relevant biosynthetic pathways.

Metabolic profiling of two white-rot fungi during 4-hydroxybenzoate conversion reveals biotechnologically relevant biosynthetic pathways.

(2025)

White-rot fungi are efficient organisms for the mineralization of lignin and polysaccharides into CO2 and H2O. Despite their biotechnological potential, WRF metabolism remains underexplored. Building on recent findings regarding the utilization of lignin-related aromatic compounds as carbon sources by WRF, we aimed to gain further insights into these catabolic processes. For this purpose, Trametes versicolor and Gelatoporia subvermispora were incubated in varying conditions - in static and agitation modes and different antioxidant levels - during the conversion of 4-hydroxybenzoic acid (a lignin-related compound) and cellobiose. Their metabolic responses were assessed via transcriptomics, proteomics, lipidomics, metabolomics, and microscopy analyses. These analyses reveal the significant impact of cultivation conditions on sugar and aromatic catabolic pathways, as well as lipid composition of the fungal mycelia. Additionally, this study identifies biosynthetic pathways for the production of extracellular fatty acids and phenylpropanoids - both products with relevance in biotechnological applications - and provides insights into carbon fate in nature.

Cover page of Bridging the gap: pathway programs for inclusion and persistence in microbiology

Bridging the gap: pathway programs for inclusion and persistence in microbiology

(2025)

Microbiology plays an important role in most sectors. Future progress in critical areas requires diverse workforce development. We outline a pathway program that aims to provide equitable exposure to high-impact research experiences and course-based instruction to provide crucial training in growing areas of microbiology (phage discovery, synthetic biology and data science/AI).

Cover page of A change language for ontologies and knowledge graphs

A change language for ontologies and knowledge graphs

(2025)

Ontologies and knowledge graphs (KGs) are general-purpose computable representations of some domain, such as human anatomy, and are frequently a crucial part of modern information systems. Most of these structures change over time, incorporating new knowledge or information that was previously missing. Managing these changes is a challenge, both in terms of communicating changes to users and providing mechanisms to make it easier for multiple stakeholders to contribute. To fill that need, we have created KGCL, the Knowledge Graph Change Language (https://github.com/INCATools/kgcl), a standard data model for describing changes to KGs and ontologies at a high level, and an accompanying human-readable Controlled Natural Language (CNL). This language serves two purposes: a curator can use it to request desired changes, and it can also be used to describe changes that have already happened, corresponding to the concepts of "apply patch" and "diff" commonly used for managing changes in text documents and computer programs. Another key feature of KGCL is that descriptions are at a high enough level to be useful and understood by a variety of stakeholders-e.g. ontology edits can be specified by commands like "add synonym 'arm' to 'forelimb'" or "move 'Parkinson disease' under 'neurodegenerative disease'." We have also built a suite of tools for managing ontology changes. These include an automated agent that integrates with and monitors GitHub ontology repositories and applies any requested changes and a new component in the BioPortal ontology resource that allows users to make change requests directly from within the BioPortal user interface. Overall, the KGCL data model, its CNL, and associated tooling allow for easier management and processing of changes associated with the development of ontologies and KGs. Database URL: https://github.com/INCATools/kgcl.

Cover page of Unraveling the influence of microbial necromass on subsurface microbiomes: metabolite utilization and community dynamics

Unraveling the influence of microbial necromass on subsurface microbiomes: metabolite utilization and community dynamics

(2025)

The role of microbial necromass (nonliving microbial biomass), a significant component of belowground organic carbon, in nutrient cycling and its impact on the dynamics of microbial communities in subsurface systems remains poorly understood. It is currently unclear whether necromass metabolites from various microbes are different, whether certain groups of metabolites are preferentially utilized over others, or whether different microbial species respond to various necromass metabolites. In this study, we aimed to fill these knowledge gaps by designing enrichments with necromass as the sole nutrient source for subsurface microbial communities. We used the soluble fraction of necromass from bacterial isolates belonging to Arthrobacter, Agrobacterium, and Pseudomonas genera, and our results indicate that metabolite composition of necromass varied slightly across different strains but generally included amino acids, organic acids, and nucleic acid constituents. Arthrobacter-derived necromass appeared more recalcitrant. Necromass metabolites enriched diverse microbial genera, particularly Massilia sp. responded quickly regardless of the necromass source. Despite differences in necromass utilization, microbial community composition converged rapidly over time across the three different necromass amendments. Uracil, xanthine, valine, and phosphate-containing isomers were generally depleted over time, indicating microbial assimilation for maintenance and growth. However, numerous easily assimilable metabolites were not significantly depleted, suggesting efficient necromass recycling and the potential for necromass stabilization in systems. This study highlights the dynamic interactions between microbial necromass metabolites and subsurface microbial communities, revealing both selective utilization and rapid community and necromass convergence regardless of the necromass source.

Cover page of Genetic modification of the shikimate pathway to reduce lignin content in switchgrass (Panicum virgatum L.) significantly impacts plant microbiomes

Genetic modification of the shikimate pathway to reduce lignin content in switchgrass (Panicum virgatum L.) significantly impacts plant microbiomes

(2025)

Switchgrass (Panicum virgatum L.) is considered a sustainable biofuel feedstock, given its fast-impact growth, low input requirements, and high biomass yields. Improvements in bioenergy conversion efficiency of switchgrass could be made by reducing its lignin content. Engineered switchgrass that expresses a bacterial 3-dehydroshikimate dehydratase (QsuB) has reduced lignin content and improved biomass saccharification due to the rerouting of the shikimate pathway towards the simple aromatic protocatechuate at the expense of lignin biosynthesis. However, the impacts of this QsuB trait on switchgrass microbiome structure and function remain unclear. To address this, wild-type and QsuB-engineered switchgrass were grown in switchgrass field soils, and samples were collected from inflorescences, leaves, roots, rhizospheres, and bulk soils for microbiome analysis. We investigated how QsuB expression influenced switchgrass-associated fungal and bacterial communities using high-throughput Illumina MiSeq amplicon sequencing of ITS and 16S rDNA. Compared to wild-type, QsuB-engineered switchgrass hosted different microbial communities in roots, rhizosphere, and leaves. Specifically, QsuB-engineered plants had a lower relative abundance of arbuscular mycorrhizal fungi (AMF). Additionally, QsuB-engineered plants had fewer Actinobacteriota in root and rhizosphere samples. These findings may indicate that changes in the plant metabolism impact both AMF and Actinobacteriota similarly or potential interactions between AMF and the bacterial community. This study enhances understanding of plant-microbiome interactions by providing baseline microbial data for developing beneficial bioengineering strategies and by assessing nontarget impacts of engineered plant traits on the plant microbiome.

Importance

Bioenergy crops provide an important strategy for mitigating climate change. Reducing the lignin in bioenergy crops could improve fermentable sugar yields for more efficient conversion into bioenergy and bioproducts. In this study, we assessed how switchgrass engineered for low lignin impacted aboveground and belowground switchgrass microbiome. Our results show unexpected reductions in mycorrhizas and actinobacteria in belowground tissues, raising questions on the resilience and function of genetically engineered plants in agricultural systems.