Skip to main content
eScholarship
Open Access Publications from the University of California
Cover page of The tier system: a host development framework for bioengineering

The tier system: a host development framework for bioengineering

(2025)

Development of microorganisms into mature bioproduction host strains has typically been a slow and circuitous process, wherein multiple groups apply disparate approaches with minimal coordination over decades. To help organize and streamline host development efforts, we introduce the Tier System for Host Development, a conceptual model and guide for developing microbial hosts that can ultimately lead to a systematic, standardized, less expensive, and more rapid workflow. The Tier System is made up of three Tiers, each consisting of a unique set of strain development Targets, including experimental tools, strain properties, experimental information, and process models. By introducing the Tier System, we hope to improve host development activities through standardization and systematization pertaining to nontraditional chassis organisms.

Cover page of Bridging the gap: pathway programs for inclusion and persistence in microbiology

Bridging the gap: pathway programs for inclusion and persistence in microbiology

(2025)

Microbiology plays an important role in most sectors. Future progress in critical areas requires diverse workforce development. We outline a pathway program that aims to provide equitable exposure to high-impact research experiences and course-based instruction to provide crucial training in growing areas of microbiology (phage discovery, synthetic biology and data science/AI).

Cover page of A change language for ontologies and knowledge graphs.

A change language for ontologies and knowledge graphs.

(2025)

Ontologies and knowledge graphs (KGs) are general-purpose computable representations of some domain, such as human anatomy, and are frequently a crucial part of modern information systems. Most of these structures change over time, incorporating new knowledge or information that was previously missing. Managing these changes is a challenge, both in terms of communicating changes to users and providing mechanisms to make it easier for multiple stakeholders to contribute. To fill that need, we have created KGCL, the Knowledge Graph Change Language (https://github.com/INCATools/kgcl), a standard data model for describing changes to KGs and ontologies at a high level, and an accompanying human-readable Controlled Natural Language (CNL). This language serves two purposes: a curator can use it to request desired changes, and it can also be used to describe changes that have already happened, corresponding to the concepts of apply patch and diff commonly used for managing changes in text documents and computer programs. Another key feature of KGCL is that descriptions are at a high enough level to be useful and understood by a variety of stakeholders-e.g. ontology edits can be specified by commands like add synonym arm to forelimb or move Parkinson disease under neurodegenerative disease. We have also built a suite of tools for managing ontology changes. These include an automated agent that integrates with and monitors GitHub ontology repositories and applies any requested changes and a new component in the BioPortal ontology resource that allows users to make change requests directly from within the BioPortal user interface. Overall, the KGCL data model, its CNL, and associated tooling allow for easier management and processing of changes associated with the development of ontologies and KGs. Database URL: https://github.com/INCATools/kgcl.

Cover page of Genetic modification of the shikimate pathway to reduce lignin content in switchgrass (Panicum virgatum L.) significantly impacts plant microbiomes

Genetic modification of the shikimate pathway to reduce lignin content in switchgrass (Panicum virgatum L.) significantly impacts plant microbiomes

(2025)

Switchgrass (Panicum virgatum L.) is considered a sustainable biofuel feedstock, given its fast-impact growth, low input requirements, and high biomass yields. Improvements in bioenergy conversion efficiency of switchgrass could be made by reducing its lignin content. Engineered switchgrass that expresses a bacterial 3-dehydroshikimate dehydratase (QsuB) has reduced lignin content and improved biomass saccharification due to the rerouting of the shikimate pathway towards the simple aromatic protocatechuate at the expense of lignin biosynthesis. However, the impacts of this QsuB trait on switchgrass microbiome structure and function remain unclear. To address this, wild-type and QsuB-engineered switchgrass were grown in switchgrass field soils, and samples were collected from inflorescences, leaves, roots, rhizospheres, and bulk soils for microbiome analysis. We investigated how QsuB expression influenced switchgrass-associated fungal and bacterial communities using high-throughput Illumina MiSeq amplicon sequencing of ITS and 16S rDNA. Compared to wild-type, QsuB-engineered switchgrass hosted different microbial communities in roots, rhizosphere, and leaves. Specifically, QsuB-engineered plants had a lower relative abundance of arbuscular mycorrhizal fungi (AMF). Additionally, QsuB-engineered plants had fewer Actinobacteriota in root and rhizosphere samples. These findings may indicate that changes in the plant metabolism impact both AMF and Actinobacteriota similarly or potential interactions between AMF and the bacterial community. This study enhances understanding of plant-microbiome interactions by providing baseline microbial data for developing beneficial bioengineering strategies and by assessing nontarget impacts of engineered plant traits on the plant microbiome.

Importance

Bioenergy crops provide an important strategy for mitigating climate change. Reducing the lignin in bioenergy crops could improve fermentable sugar yields for more efficient conversion into bioenergy and bioproducts. In this study, we assessed how switchgrass engineered for low lignin impacted aboveground and belowground switchgrass microbiome. Our results show unexpected reductions in mycorrhizas and actinobacteria in belowground tissues, raising questions on the resilience and function of genetically engineered plants in agricultural systems.

Cover page of VISTA Enhancer browser: an updated database of tissue-specific developmental enhancers

VISTA Enhancer browser: an updated database of tissue-specific developmental enhancers

(2025)

Regulatory elements (enhancers) are major drivers of gene expression in mammals and harbor many genetic variants associated with human diseases. Here, we present an updated VISTA Enhancer Browser (https://enhancer.lbl.gov), a database of transgenic enhancer assays conducted in developing mouse embryos in vivo. Since the original publication in 2007, the database grew nearly 20-fold from 250 to over 4500 experiments and currently harbors over 23 500 images. The updated database provides structured information on experiments conducted at different stages of embryonic development, including enhancer activities of human pathogenic and synthetic variants and sequences derived from a variety of species. In addition to manually curated results of thousands of individual experiments, the new database also features hundreds of manually curated comparisons between alleles. The VISTA Enhancer Browser provides a crucial resource for study of human genetic variation, gene regulation and developmental biology.

Combinatorial transcription factor binding encodes cis-regulatory wiring of mouse forebrain GABAergic neurogenesis

(2025)

Transcription factors (TFs) bind combinatorially to cis-regulatory elements, orchestrating transcriptional programs. Although studies of chromatin state and chromosomal interactions have demonstrated dynamic neurodevelopmental cis-regulatory landscapes, parallel understanding of TF interactions lags. To elucidate combinatorial TF binding driving mouse basal ganglia development, we integrated chromatin immunoprecipitation sequencing (ChIP-seq) for twelve TFs, H3K4me3-associated enhancer-promoter interactions, chromatin and gene expression data, and functional enhancer assays. We identified sets of putative regulatory elements with shared TF binding (TF-pRE modules) that orchestrate distinct processes of GABAergic neurogenesis and suppress other cell fates. The majority of pREs were bound by one or two TFs; however, a small proportion were extensively bound. These sequences had exceptional evolutionary conservation and motif density, complex chromosomal interactions, and activity as in vivo enhancers. Our results provide insights into the combinatorial TF-pRE interactions that activate and repress expression programs during telencephalon neurogenesis and demonstrate the value of TF binding toward modeling developmental transcriptional wiring.

Cover page of Metagenome-assembled genomes of freshwater Hyphomicrobium sp. G-191 and Methylophilus sp. enriched from Cedar Swamp, Woods Hole, MA.

Metagenome-assembled genomes of freshwater Hyphomicrobium sp. G-191 and Methylophilus sp. enriched from Cedar Swamp, Woods Hole, MA.

(2024)

Hyphomicrobium are facultative denitrifying anaerobes capable of using one-carbon compounds as a sole carbon source. Hyphomicrobium sp. G-191 was enriched from Cedar Swamp, Woods Hole, Massachusetts, using a selective medium for methanol-utilizing bacteria. We present two draft metagenome-assembled genomes (MAGs) of a Hyphomicrobium and a Methylophilus species.

Cover page of Efficient reinterpretation of rare disease cases using Exomiser.

Efficient reinterpretation of rare disease cases using Exomiser.

(2024)

Whole genome sequencing has transformed rare disease research; however, 50-80% of rare disease patients remain undiagnosed after such testing. Regular reanalysis can identify new diagnoses, especially in newly discovered disease-gene associations, but efficient tools are required to support clinical interpretation. Exomiser, a phenotype-driven variant prioritisation tool, fulfils this role; within the 100,000 Genomes Project (100kGP), diagnoses were identified after reanalysis in 463 (2%) of 24,015 unsolved patients after previous analysis for variants in known disease genes. However, extensive manual interpretation was required. This led us to develop a reanalysis strategy to efficiently reveal candidates from recent disease gene discoveries or newly designated pathogenic/likely pathogenic variants. Optimal settings to highlight new candidates from Exomiser reanalysis were identified with high recall (82%) and precision (88%) when including Exomisers automated ACMG/AMP classifier, which correctly converted 92% of variants from unknown significance to pathogenic/likely pathogenic. In conclusion, Exomiser efficiently reinterprets previously unsolved cases.

Cover page of Tapping the treasure trove of atypical phages

Tapping the treasure trove of atypical phages

(2024)

With advancements in genomics technologies, a vast diversity of 'atypical' phages, that is, with single-stranded DNA or RNA genomes, are being uncovered from different ecosystems. Though these efforts have revealed the existence and prevalence of these nonmodel phages, computational approaches often fail to associate these phages with their specific bacterial host(s), while the lack of methods to isolate these phages has limited our ability to characterize infectivity pathways and new gene function. In this review, we call for the development of generalizable experimental methods to better capture this understudied viral diversity via isolation and study them through gene-level characterization and engineering. Establishing a diverse set of new 'atypical' phage model systems has the potential to provide many new biotechnologies, including potential uses of these atypical phages in halting the spread of antibiotic resistance and engineering of microbial communities for beneficial outcomes.

Standardized and accessible multi-omics bioinformatics workflows through the NMDC EDGE resource

(2024)

Accessible and easy-to-use standardized bioinformatics workflows are necessary to advance microbiome research from observational studies to large-scale, data-driven approaches. Standardized multi-omics data enables comparative studies, data reuse, and applications of machine learning to model biological processes. To advance broad accessibility of standardized multi-omics bioinformatics workflows, the National Microbiome Data Collaborative (NMDC) has developed the Empowering the Development of Genomics Expertise (NMDC EDGE) resource, a user-friendly, open-source web application (https://nmdc-edge.org). Here, we describe the design and main functionality of the NMDC EDGE resource for processing metagenome, metatranscriptome, natural organic matter, and metaproteome data. The architecture relies on three main layers (web application, orchestration, and execution) to ensure flexibility and expansion to future workflows. The orchestration and execution layers leverage best practices in software containers and accommodate high-performance computing and cloud computing services. Further, we have adopted a robust user research process to collect feedback for continuous improvement of the resource. NMDC EDGE provides an accessible interface for researchers to process multi-omics microbiome data using production-quality workflows to facilitate improved data standardization and interoperability.