ABSTRACT
The DOE JGI Metagenome Workflow performs metagenome data processing, including assembly, structural, functional, and taxonomic annotation, and binning of metagenomic datasets that are subsequently included into the Integrated Microbial Genomes and Microbiomes (IMG/M) comparative analysis system (I. Chen, K. Chu, K. Palaniappan, M. Pillay, A. Ratner, J. Huang, M. Huntemann, N. Varghese, J. White, R. Seshadri, et al, Nucleic Acids Rsearch, 2019) and provided for download via the Joint Genome Institute (JGI) Data Portal ( https://genome.jgi.doe.gov/portal/ ). This workflow scales to run on thousands of metagenome samples per year, which can vary by the complexity of microbial communities and sequencing depth. Here we describe the different tools, databases, and parameters used at different steps of the workflow, to help with interpretation of metagenome data available in IMG and to enable researchers to apply this workflow to their own data. We use 20 publicly available sediment metagenomes to illustrate the computing requirements for the different steps and highlight the typical results of data processing. The workflow modules for read filtering and metagenome assembly are available as a Workflow Description Language (WDL) file ( https://code.jgi.doe.gov/BFoster/jgi_meta_wdl.git ). The workflow modules for annotation and binning are provided as a service to the user community at https://img.jgi.doe.gov/submit and require filling out the project and associated metadata descriptions in Genomes OnLine Database (GOLD) (S. Mukherjee, D. Stamatis, J. Bertsch, G. Ovchinnikova, H. Katta, A. Mojica, I Chen, and N. Kyrpides, and T. Reddy, Nucleic Acids Research, 2018). IMPORTANCE
The DOE JGI Metagenome Workflow is designed for processing metagenomic datasets starting from Illumina fastq files. It performs data pre-processing, error correction, assembly, structural and functional annotation, and binning. The results of processing are provided in several standard formats, such as fasta and gff and can be used for subsequent integration into the Integrated Microbial Genome (IMG) system where they can be compared to a comprehensive set of publicly available metagenomes. As of 7/30/2020 7,155 JGI metagenomes have been processed by the JGI Metagenome Workflow.