A wide range of specialized (or “secondary”) metabolites are produced by bacteria in naturalecosystems with varying functions, including antibiotics, siderophores, signalling molecules, and
antifungals. These specialized metabolites are produced using operonic sets of genes that work in
concert, known as biosynthetic gene clusters. In this work, genome-resolved metagenomic
approaches were applied to understand the distribution and ecology of biosynthetic genes and the
bacteria that possess them. Bacterial genomes were assembled and binned from deeply sequenced
metagenomes from a California grassland meadow soil and a permanently wet vernal pool soil, and
clusters of biosynthetic genes were identified in each genome. Genomes were reconstructed for
novel species that belong to rarely cultivated but ubiquitous soil phyla, including the Acidobacteria,
Verrucomicrobia, and the candidate phylum Rokubacteria. Bacteria from the grassland meadow soil
were shown to have unexpectedly large numbers of biosynthetic gene clusters. In particular, two
novel lineages of Acidobacteria were identified that possessed an unusual genomic capacity for
specialized metabolite biosynthesis - up to 15% of their genomes were predicted to be dedicated to
the production of nonribosomal peptides and polyketides. Sampling a second study site of soils from
a vernal pool, Another three species were obtained from one of these uncultivated lineages — the
candidate genus Angelobacter — which also possessed a large genomic repertoire of diverse
biosynthetic genes. By mining public soil metagenomes, additional high quality draft genomes from
this candidate genus were also analyzed, confirming that species in this genus are widespread across
soil environments. It was therefore established that Angelobacter spp. with a substantial capacity for
specialized metabolite biosynthesis are widespread in soils with a range of moisture contents and
vegetation types.
Transcriptional activity of nonribosomal peptide synthetase and polyketide synthase genes of
abundant organisms from the grassland soil was tracked over time using 120 metatranscriptomic
samples from soil microcosms. For several bacterial species within the samples, unsupervised
clustering of genes by co-expression across samples identified modules of biosynthetic genes that
were tightly co-expressed with genes involved in transcriptional regulation, environmental sensing,
and secretion. For some vernal pool samples where Angelobacter were the most abundant microbial
community members, metatranscriptomics demonstrated clear transcriptional activity in situ .
Transcription of many Angelobacter biosynthetic genes was detected, extending findings from the
grassland soil microcosms.
Genetic variation in soil bacteria and their biosynthetic genes was investigated using population
genomics methods that leverage genetic variation within sequencing reads that map to genomes
from metagenomes. Metagenomic methods to track genetic variation within populations in a spatial
context were applied to study the most abundant bacterial species across the grassland meadow
study site. Genetic variation specifically within biosynthetic genes was elevated, indicating that
there can be substantial allelic diversity in the biosynthetic genes of an abundant species in a local
soil ecosystem. For about half of the bacterial populations studied, strong genetic population
structure associated with spatial scale was observed. Genomes and gene variants were more
genetically similar if they were from the same meadow plot. Simultaneously, while genetic gradients
were observed across the meadow, within sample genetic diversity was also found to be high.
Genomic signatures of recombination and gene-specific selection were also identified, indicating
that ongoing selection and recombination may shape genetic divergence of populations on local
spatial scales in soils.
While biosynthetic gene clusters can be outlined and annotated with confidence in microbial
genomes, prediction of the function of the metabolites produced for novel gene clusters is often an
unsolved problem. Colocalized transporter genes associated with biosynthetic gene clusters may
help predict metabolite function, due to their intimate association with the metabolite(s) they are
transporting. This hypothesis was tested and benchmarked on a dataset of characterized
biosynthetic gene clusters. In particular, a strong specificity of transporter genes for siderophore
export and re-uptake was quantified as a signal of siderophore production. Using this specific
genomic signal, putative siderophore BGCs were annotated across bacterial genomes recovered
from soil, as well as from better characterized microbes from the adult and premature infant
microbiomes. Surprisingly few genomes from soil bacteria contained transporter genes associated
with siderophore biosynthesis. While 23% of microbial genomes from premature infant
microbiomes possess at least one siderophore-like biosynthetic gene cluster, only 3% of those from
adult gut microbiomes do.
In sum, this thesis presented a metagenomic perspective on specialized metabolisms, contributed to
discovery of novel species, examined evolutionary processes, and improved genomic functional
predictions. The strength of this approach lies in its ability to investigate microbes in in situ
community contexts and detect ecological trends among the uncultivated microbial majority.