Microbial eukaryotes are important pathogens, environmental quality indicators, integral components of natural microbial communities, and critical for understanding our own evolutionary history. Yet, microbial eukaryotes are an often neglected component of microbial ecology studies. Common metagenomic techniques, such as 16S rRNA gene sequencing, fully omit eukaryotes, and they are frequently ignored in shotgun-metagenomic sequencing projects. A methodology was developed for recovering eukaryotic genomes from metagenomes that relies upon a newly developed machine learning-based method, EukRep, to separate Eukaryotic scaffolds from prokaryotic scaffolds prior to binning. In this way, eukaryotic gene predictors can be applied to eukaryotic scaffolds, eliminating one of the largest challenges to properly binning eukaryotes in shotgun metagenomic samples. The effectiveness of EukRep was tested on both mock communities constructed from reference bacterial, archaeal, and eukaryotic genomes in silico as well as on natural microbial community samples and shown to enable the recovery of near-complete eukaryotic genomes including high-quality fungal, protist, and rotifer genomes from complex environmental samples. Thus, this approach enables consistent genome reconstruction and prediction of metabolic and behavioral potential for eukaryotes as well as their associated communities in a culture independent, natural microbial community context.
A EukRep-based approach was used to investigate the effect of addition of organic carbon to a geyser-associated microbial community. Crystal Geyser, a CO2-driven geyser in Utah (USA), provides large volumes of deeply sourced fluids, thus is well suited for studying microbial communities in high CO2 environments. Upon addition of organic carbon there was a substantial change of the community metabolism, with selection against almost all candidate phyla bacteria and archaea and for eukaryotes. Near complete genomes were reconstructed for three fungi placed within the Eurotiomycetes and an arthropod. While carbon fixation and sulfur oxidation were important functions in the geyser community prior to carbon addition, the organic carbon-impacted community showed enrichment for secreted proteases, secreted lipases, cellulose targeting CAZymes, and methanol oxidation. The results demonstrate the broader utility of EukRep for reconstruction and evaluation of relatively high-quality fungal, protist, and rotifer genomes from complex environmental samples. This approach opens the way for cultivation-independent analyses of whole microbial communities.
Fungi are common members of the human microbiome, but are often excluded from metagenomic studies due to the large size and complexity of Eukaryotic genomes. Here, targeted Eukaryotic genome recovery was performed on over a thousand metagenomes from premature infant fecal samples and twenty-eight metagenomes from the neonatal intensive care unit (NICU) housing the infants. Samples were screened for the presence of Eukaryotes using a machine learning classifier, and de novo genome assembly, curation, and annotation was performed on identified samples. Seventeen distinct Eukaryotic genomes were recovered (median completeness 91%; median size 15.6 Mbp), including genomes from four strains of Candida albicans, seven genera of fungi, and two organisms (Diptera (fly) and Rhabditid (nematode)) with no previously sequenced genomes of the same family. Seven percent of infants were colonized by a Eukaryote during the first months of life, and prevalence was significantly associated with administration of maternal antibiotics and particular bacterial taxa. All NICU samples had detectable fungal communities (median relative abundance 2%, full range 0.3-24.1%), and different locations in the NICU had distinct Eukaryotic microbiomes. Near-identical genomes of Purpureocillium lilacinum were recovered from both infant and NICU samples (99.999% average nucleotide identity), highlighting the potential for environmental NICU fungi to colonize premature infants. Zygosity and potential aneuploidy were determined for all assembled genomes, and regions with loss of heterozygosity (indicative of recent genome evolution) were detected in some C. albicans genomes. This study resolved Eukaryote dynamics in the NICU and premature infant gut samples, and reveals potential reservoirs of unexpected eukaryotic diversity within the hospital environment.
Candida parapsilosis is the third most common cause of invasive candidiasis. C. parapsilosis infections have been continually increasing in prevalence over the past two decades, and at significantly higher prevalence in neonates than other at risk populations, marking its importance as an emerging pathogen. Despite this, C. parapsilosis is understudied. The recovered C. parapsilosis genomes contain small genomic regions with highly elevated levels of Single Nucleotide Variants (SNVs), which we refer to as SNV hotspots. SNV hotspots are shared between strains, with some unique to C. parapsilosis strains from a single hospital. Four of the C. parapsilosis genomes have a high copy number (4-16) RTA3 gene, a lipid translocase previously implicated in antifungal resistance, potentially indicative of adaptation to antifungal treatment. Additionally, time course metatranscriptomics and metaproteomics were performed on a premature infant with a documented C. parapsilosis blood infection, offering a rare look at the in vivo expression and protein landscape of a Candida species. C. parapsilosis in situ expression is highly distinct from culture settings, but also highly variable, demonstrating the importance of studying Candida in situ in addition to culture settings.
Mono Lake, CA, is a high alkalinity, hypersaline lake with an unusually productive ecosystem largely supported by benthic and planktonic algae. A species of choanoflagellate from Mono Lake that forms a multicellular, hollow rosette filled with bacteria, but little was known about this choanoflagellate and its associated microbial community. This association is of interest given choanoflagellates are the closest living relatives to animals and the analogy between rosette-enclosed consortia and animal gut microbiomes. Metagenomic shotgun sequencing was performed in order to reconstruct genomes for the choanoflagellete and its associated community. EukRep was used for eukaryotic sequence identification and enabled genome recovery, genome completeness evaluation and prediction of metabolic potential of both the choanoflagellate nuclear and mitochondrial genomes. The nuclear draft genome measures 49 Mbp in length, contains 11052 predicted genes and appears to be near complete. Interestingly, its extracellular proteins have a higher isoelectric point compared to marine choanoflagellates, likely an adaptation to their saline, high pH environment. Characterization of bacterial communities leveraged samples taken from choanoflagellate rosette enriched and choanoflagellate rosette depleted samples in order to distinguish bacteria within and outside rosettes. Across all samples, 23 near-complete bacterial genomes were recovered, primarily belonging to Gammaproteobacteria, Bacteroidetes, and Spirochaeta. Of these, seven were found only in the choanoflagellete enriched samples, suggesting that these bacteria are partitioned into the rosette interior. Overall, the research provided insights into the composition and metabolic interactions between an ordered assemblage of single celled eukaryotes and its enclosed microbiome.
In this work, genome-resolved and culture-independent methods are employed to study microbial eukaryotes in a variety of natural community contexts, ranging from animal microbiomes, the hospital room, and environmental communities. The development of EukRep and subsequent incorporation into metagenomic pipelines represents an important methodological advance for the comprehensive study of the structure and ecology of natural microbial communities and provides new insights into community functioning.