Joint Genome Institute

Beneath the surface: Unsolved questions in soil virus ecology

(2025)

Soil virus ecology is an exciting but still nascent field of research in soil microbiology. While there has been a recent surge in soil virus research studies, many fundamental questions remain unanswered, and a range of technical and bioinformatic challenges need to be overcome. In this perspective article, we present a series of key questions that highlight fruitful research areas for ongoing and future efforts. These include describing the challenges involved in understanding soil viral abundance and activity, spatiotemporal dynamics, life strategy prevalence, virus-mediated biogeochemical impacts, viral protein function, host prediction, and soil RNA virus discovery. In the near term, combining approaches (e.g., cultivation-based, meta-omics, biogeochemical, experimental, and bioinformatic) will be key to assessing the ecological and biogeochemical impacts of soil viruses from the microscopic to the field and global scales. Still, we stress that results must be tempered by current methodological limitations and highlight knowledge gaps that are most pressing to fill via new methods or measurements, such as the prevalence of different viral replication strategies across soils, the fate of microbial necromass carbon after viral lysis, the frequency of virus-host encounters that do not lead to successful infections yet could be bioinformatically mistaken as infections, and the diversity and ecological impacts of RNA viruses in soil.

Cover page of Primary and Re-exposure effects of D-enantiomeric peptide on metabolism, diversity, and composition of oral biofilms at different stages of recovery.

Primary and Re-exposure effects of D-enantiomeric peptide on metabolism, diversity, and composition of oral biofilms at different stages of recovery.

(2025)

The persistence of bacteria in the root canal system is the primary cause of recurrent apical periodontitis. The adaptability of residual bacteria to changing environmental conditions is a key survival strategy of biofilms, often leading to endodontic treatment failure. DJK-5 is a protease-resistant, broad-spectrum D-enantiomeric peptide that degrades or prevents the accumulation of guanosine penta- and tetraphosphates, which are important for biofilm formation. We evaluated the effects of primary antimicrobial agents and nutrient conditions on the recovery, metabolism, diversity, and composition of oral biofilms, and investigated how these factors affect the efficacy of DJK-5 and chlorhexidine (CHX) during re-exposure. Primary irrigants and nutrient conditions significantly influenced biofilm recovery, metabolic activity, diversity, and composition. Biofilm recovery was slower in nutrient-poor groups compared to nutrient-rich ones, and nutrient availability had the greatest effect on shaping both the diversity and composition of the biofilms. Water and DJK-5 groups showed similar biofilm diversity trends, while CHX generally led to lower diversity. Results indicate that primary irrigants and nutrient conditions significantly impact biofilm composition, diversity, and recovery. However, these changes did not compromise DJK-5s effectiveness in killing of biofilm microbes during re-exposure of recovered biofilms.

Multimodal SARS-CoV-2 interactome sketches the virus-host spatial organization.

(2025)

An accurate spatial representation of protein-protein interaction networks is needed to achieve a realistic and biologically relevant representation of interactomes. Here, we leveraged the spatial information included in Proximity-Dependent Biotin Identification (BioID) interactomes of SARS-CoV-2 proteins to calculate weighted distances and model the organization of the SARS-CoV-2-human interactome in three dimensions (3D) within a cell-like volume. Cell regions with viral occupancy were highlighted, along with the coordination of viral proteins exploiting the cellular machinery. Profiling physical intra-virus and virus-host contacts enabled us to demonstrate both the accuracy and the predictive value of our 3D map for direct interactions, meaning that proteins in closer proximity tend to interact physically. Several functionally important virus-host complexes were detected, and robust structural models were obtained, opening the way to structure-directed drug discovery screens. This PPI discovery pipeline approach brings us closer to a realistic spatial representation of interactomes, which, when applied to viruses or other pathogens, can provide significant information for infection. Thus, it represents a promising tool for coping with emerging infectious diseases.

Quantifying the impact of workshops promoting microbiome data standards and data stewardship.

(2025)

The field of microbiome research continues to grow at a rapid pace, with multi-omics approaches becoming widely used to interrogate diverse microbiome samples. However, due to lagging awareness and implementation of standards and data stewardship, many datasets are produced that are not comparable, reproducible, or reusable. In 2021, the National Microbiome Data Collaborative launched its Ambassador Program, which utilizes a community-learning model to annually train a cohort of early-career researchers in microbiome data stewardship best practices. These Ambassadors then host workshops and other events to communicate these themes to their respective microbiome research communities. To quantify the impact of this learning model for promoting awareness of and experience with microbiome data, we conducted a survey of workshop participants from events hosted by the 2023 Ambassador cohort. The 2023 cohort of 13 National Microbiome Data Collaborative Ambassadors collectively hosted 21 events, reaching over 550 researchers. The Ambassadors distributed an anonymous post-workshop survey to their event participants to quantify the effectiveness of the training materials, the workshop format, and the thematic content. From the 21 events, survey results were successfully collected for 15 of those events from a total of 122 researchers. Overall, 122 participants working with a range of microbiome types and from a variety of institutions responded to the survey and reported overwhelmingly positive experiences with the workshop content and materials, with 98% of respondents reporting that they gained knowledge from the event. Participants across the events also reported an increase in their post-workshop understanding of metadata standards, principles for microbiome data management and reporting, and the importance of standardization in microbiome data processing. Participants also expressed a willingness to apply what they learned about microbiome data stewardship to their own research. The results of this study demonstrate the effectiveness of hands-on workshops and community-learning for communicating data stewardship best practices to microbiome researchers. The lessons learned and details about the implementation of this cohort-based learning model contained herein are intended to assist other groups in their efforts to create or improve similar learning strategies.

Cell-free synthetic biology for natural product biosynthesis and discovery

(2025)

Natural products have applications as biopharmaceuticals, agrochemicals, and other high-value chemicals. However, there are challenges in isolating natural products from their native producers (e.g. bacteria, fungi, plants). In many cases, synthetic chemistry or heterologous expression must be used to access these important molecules. The biosynthetic machinery to generate these compounds is found within biosynthetic gene clusters, primarily consisting of the enzymes that biosynthesise a range of natural product classes (including, but not limited to ribosomal and nonribosomal peptides, polyketides, and terpenoids). Cell-free synthetic biology has emerged in recent years as a bottom-up technology applied towards both prototyping pathways and producing molecules. Recently, it has been applied to natural products, both to characterise biosynthetic pathways and produce new metabolites. This review discusses the core biochemistry of cell-free synthetic biology applied to metabolite production and critiques its advantages and disadvantages compared to whole cell and/or chemical production routes. Specifically, we review the advances in cell-free biosynthesis of ribosomal peptides, analyse the rapid prototyping of natural product biosynthetic enzymes and pathways, highlight advances in novel antimicrobial discovery, and discuss the rising use of cell-free technologies in industrial biotechnology and synthetic biology.

Virocell Necromass Provides Limited Plant Nitrogen and Elicits Rhizosphere Metabolites That Affect Phage Dynamics.

(2025)

Bacteriophages impact soil bacteria through lysis, altering the availability of organic carbon and plant nutrients. However, the magnitude of nutrient uptake by plants from lysed bacteria remains unknown, partly because this process is challenging to investigate in the field. In this study, we extend ecosystem fabrication (EcoFAB 2.0) approaches to study plant-bacteria-phage interactions by comparing the impact of virocell (phage-lysed) and uninfected ¹⁵N-labelled bacterial necromass on plant nitrogen acquisition and rhizosphere exometabolites composition. We show that grass Brachypodium distachyon derives some nitrogen from amino acids in uninfected Pseudomonas putida necromass lysed by sonication but not from virocell necromass. Additionally, the bacterial necromass elicits the formation of rhizosphere exometabolites, some of which (guanosine), alongside tested aromatic acids (p-coumaric and benzoic acid), show bacterium-specific effects on bacteriophage-induced lysis when tested in vitro. The study highlights the dynamic feedback between virocell necromass and plants and suggests that root exudate metabolites can impact bacteriophage infection dynamics.

Low Mutation Rate and Atypical Mutation Spectrum in Prasinoderma coloniale: Insights From an Early Diverging Green Lineage

(2025)

Mutations are the ultimate source of genetic diversity on which natural selection and genetic drift act, playing a crucial role in evolution and long-term adaptation. At the molecular level, the spontaneous mutation rate (µ), defined as the number of mutations per base per generation, thus determines the adaptive potential of a species. Through a mutation accumulation experiment, we estimate the mutation rate and spectrum in Prasinoderma coloniale, a phytoplankton species from an early-branching lineage within the Archaeplastida, characterized by an unusually high genomic guanine-cytosine (GC) content (69.8%). We find that P. coloniale has a very low total mutation rate of µ = 2.00 × 10-10. The insertion-deletion mutation rate is almost 5 times lesser than the single nucleotide mutation rate with µID = 3.40 × 10-11 and µSNM = 1.62 × 10-10. Prasinoderma coloniale also exhibits an atypical mutational spectrum: While essentially all other eukaryotes show a bias toward GC to AT mutations, no evidence of this AT-bias is observed in P. coloniale. Since cytosine methylation is known to be mutagenic, we hypothesized that this may result from an absence of C-methylation. Surprisingly, we found high levels of C-methylation (14% in 5mC, 25% in 5mCG contexts). Methylated cytosines did not show increased mutation rates compared with unmethylated ones, not supporting the prevailing notion that C-methylation universally leads to higher mutation rates. Overall, P. coloniale combines a GC-rich genome with a low mutation rate and original mutation spectrum, suggesting the almost universal AT-bias may not have been present in the ancestor of the green lineage.

APNet, an explainable sparse deep learning model to discover differentially active drivers of severe COVID-19.

(2025)

MOTIVATION: Computational analyses of bulk and single-cell omics provide translational insights into complex diseases, such as COVID-19, by revealing molecules, cellular phenotypes, and signalling patterns that contribute to unfavourable clinical outcomes. Current in silico approaches dovetail differential abundance, biostatistics, and machine learning, but often overlook nonlinear proteomic dynamics, like post-translational modifications, and provide limited biological interpretability beyond feature ranking. RESULTS: We introduce APNet, a novel computational pipeline that combines differential activity analysis based on SJARACNe co-expression networks with PASNet, a biologically informed sparse deep learning model, to perform explainable predictions for COVID-19 severity. The APNet driver-pathway network ingests SJARACNe co-regulation and classification weights to aid result interpretation and hypothesis generation. APNet outperforms alternative models in patient classification across three COVID-19 proteomic datasets, identifying predictive drivers and pathways, including some confirmed in single-cell omics and highlighting under-explored biomarker circuitries in COVID-19. AVAILABILITY AND IMPLEMENTATION: APNets R, Python scripts, and Cytoscape methodologies are available at https://github.com/BiodataAnalysisGroup/APNet.

BioSciences