Skip to main content
eScholarship
Open Access Publications from the University of California
Cover page of A metagenomic perspective on the microbial prokaryotic genome census.

A metagenomic perspective on the microbial prokaryotic genome census.

(2025)

Following 30 years of sequencing, we assessed the phylogenetic diversity (PD) of >1.5 million microbial genomes in public databases, including metagenome-assembled genomes (MAGs) of uncultivated microbes. As compared to the vast diversity uncovered by metagenomic sequences, cultivated taxa account for a modest portion of the overall diversity, 9.73% in bacteria and 6.55% in archaea, while MAGs contribute 48.54% and 57.05%, respectively. Therefore, a substantial fraction of bacterial (41.73%) and archaeal PD (36.39%) still lacks any genomic representation. This unrepresented diversity manifests primarily at lower taxonomic ranks, exemplified by 134,966 species identified in 18,087 metagenomic samples. Our study exposes diversity hotspots in freshwater, marine subsurface, sediment, soil, and other environments, whereas human samples yielded minimal novelty within the context of existing datasets. These results offer a roadmap for future genome recovery efforts, delineating uncaptured taxa in underexplored environments and underscoring the necessity for renewed isolation and sequencing.

Cover page of Microbial secondary metabolites: advancements to accelerate discovery towards application

Microbial secondary metabolites: advancements to accelerate discovery towards application

(2025)

Microbial secondary metabolites not only have key roles in microbial processes and relationships but are also valued in various sectors of today's economy, especially in human health and agriculture. The advent of genome sequencing has revealed a previously untapped reservoir of biosynthetic capacity for secondary metabolites indicating that there are new biochemistries, roles and applications of these molecules to be discovered. New predictive tools for biosynthetic gene clusters (BGCs) and their associated pathways have provided insights into this new diversity. Advanced molecular and synthetic biology tools and workflows including cell-based and cell-free expression facilitate the study of previously uncharacterized BGCs, accelerating the discovery of new metabolites and broadening our understanding of biosynthetic enzymology and the regulation of BGCs. These are complemented by new developments in metabolite detection and identification technologies, all of which are important for unlocking new chemistries that are encoded by BGCs. This renaissance of secondary metabolite research and development is catalysing toolbox development to power the bioeconomy.

Cover page of Genome sequences of four novel Endozoicomonas strains associated with a tropical octocoral in a long-term aquarium facility.

Genome sequences of four novel Endozoicomonas strains associated with a tropical octocoral in a long-term aquarium facility.

(2025)

We report the genome sequences of four Endozoicomonas sp. strains isolated from the octocoral Litophyton maintained long term at an aquarium facility. Our analysis reveals the coding potential for versatile polysaccharide metabolism; Type II, III, IV, and VI secretion systems; and the biosynthesis of novel ribosomally synthesized and post-translationally modified peptides.

Cover page of VISTA Enhancer browser: an updated database of tissue-specific developmental enhancers

VISTA Enhancer browser: an updated database of tissue-specific developmental enhancers

(2025)

Regulatory elements (enhancers) are major drivers of gene expression in mammals and harbor many genetic variants associated with human diseases. Here, we present an updated VISTA Enhancer Browser (https://enhancer.lbl.gov), a database of transgenic enhancer assays conducted in developing mouse embryos in vivo. Since the original publication in 2007, the database grew nearly 20-fold from 250 to over 4500 experiments and currently harbors over 23 500 images. The updated database provides structured information on experiments conducted at different stages of embryonic development, including enhancer activities of human pathogenic and synthetic variants and sequences derived from a variety of species. In addition to manually curated results of thousands of individual experiments, the new database also features hundreds of manually curated comparisons between alleles. The VISTA Enhancer Browser provides a crucial resource for study of human genetic variation, gene regulation and developmental biology.

Cover page of BGC Atlas: a web resource for exploring the global chemical diversity encoded in bacterial genomes

BGC Atlas: a web resource for exploring the global chemical diversity encoded in bacterial genomes

(2025)

Secondary metabolites are compounds not essential for an organism's development, but provide significant ecological and physiological benefits. These compounds have applications in medicine, biotechnology and agriculture. Their production is encoded in biosynthetic gene clusters (BGCs), groups of genes collectively directing their biosynthesis. The advent of metagenomics has allowed researchers to study BGCs directly from environmental samples, identifying numerous previously unknown BGCs encoding unprecedented chemistry. Here, we present the BGC Atlas (https://bgc-atlas.cs.uni-tuebingen.de), a web resource that facilitates the exploration and analysis of BGC diversity in metagenomes. The BGC Atlas identifies and clusters BGCs from publicly available datasets, offering a centralized database and a web interface for metadata-aware exploration of BGCs and gene cluster families (GCFs). We analyzed over 35 000 datasets from MGnify, identifying nearly 1.8 million BGCs, which were clustered into GCFs. The analysis showed that ribosomally synthesized and post-translationally modified peptides are the most abundant compound class, with most GCFs exhibiting high environmental specificity. We believe that our tool will enable researchers to easily explore and analyze the BGC diversity in environmental samples, significantly enhancing our understanding of bacterial secondary metabolites, and promote the identification of ecological and evolutionary factors shaping the biosynthetic potential of microbial communities.

Cover page of Genomes OnLine Database (GOLD) v.10: new features and updates

Genomes OnLine Database (GOLD) v.10: new features and updates

(2025)

The Genomes OnLine Database (GOLD; https://gold.jgi.doe.gov/) at the Department of Energy Joint Genome Institute is a comprehensive online metadata repository designed to catalog and manage information related to (meta)genomic sequence projects. GOLD provides a centralized platform where researchers can access a wide array of metadata from its four organization levels namely Study, Organism/Biosample, Sequencing Project and Analysis Project. GOLD continues to serve as a valuable resource and has seen significant growth and expansion since its inception in 1997. With its expanded role as a collaborative platform, it not only actively imports data from other primary repositories like National Center for Biotechnology Information but also supports contributions from researchers worldwide. This collaborative approach has enriched the database with diverse datasets, creating a more integrated resource to enhance scientific insights. As genomic research becomes increasingly integral to various scientific disciplines, more researchers and institutions are turning to GOLD for their metadata needs. To meet this growing demand, GOLD has expanded by adding diverse metadata fields, intuitive features, advanced search capabilities and enhanced data visualization tools, making it easier for users to find and interpret relevant information. This manuscript provides an update and highlights the new features introduced over the last 2 years.

Cover page of The secondary metabolism collaboratory: a database and web discussion portal for secondary metabolite biosynthetic gene clusters

The secondary metabolism collaboratory: a database and web discussion portal for secondary metabolite biosynthetic gene clusters

(2025)

Secondary metabolites are small molecules produced by all corners of life, often with specialized bioactive functions with clinical and environmental relevance. Secondary metabolite biosynthetic gene clusters (BGCs) can often be identified within DNA sequences by various sequence similarity tools, but determining the exact functions of genes in the pathway and predicting their chemical products can often only be done by careful, manual comparative analysis. To facilitate this, we report the first release of the secondary metabolism collaboratory (SMC), which aims to provide a comprehensive, tool-agnostic repository of BGC sequence data drawn from all publicly available and user-submitted bacterial and archaeal genome and contig sources. On the website, users are provided a searchable catalog of putative BGCs identified from each source, along with visualizations of gene and domain annotations derived from multiple sequence analysis tools. SMC's data is also available through publicly-accessible application programming interface (API) endpoints to facilitate programmatic access. Users are encouraged to share their findings (and search for others') through comment posts on BGC and source pages. At the time of writing, SMC is the largest repository of BGC information, holding 13.1M BGC regions from 1.3M source sequences and growing, and can be found at https://smc.jgi.doe.gov.

Cover page of A functional microbiome catalogue crowdsourced from North American rivers

A functional microbiome catalogue crowdsourced from North American rivers

(2025)

Predicting elemental cycles and maintaining water quality under increasing anthropogenic influence requires knowledge of the spatial drivers of river microbiomes. However, understanding of the core microbial processes governing river biogeochemistry is hindered by a lack of genome-resolved functional insights and sampling across multiple rivers. Here we used a community science effort to accelerate the sampling, sequencing and genome-resolved analyses of river microbiomes to create the Genome Resolved Open Watersheds database (GROWdb). GROWdb profiles the identity, distribution, function and expression of microbial genomes across river surface waters covering 90% of United States watersheds. Specifically, GROWdb encompasses microbial lineages from 27 phyla, including novel members from 10 families and 128 genera, and defines the core river microbiome at the genome level. GROWdb analyses coupled to extensive geospatial information reveals local and regional drivers of microbial community structuring, while also presenting foundational hypotheses about ecosystem function. Building on the previously conceived River Continuum Concept1, we layer on microbial functional trait expression, which suggests that the structure and function of river microbiomes is predictable. We make GROWdb available through various collaborative cyberinfrastructures2,3, so that it can be widely accessed across disciplines for watershed predictive modelling and microbiome-based management practices.