- Kim, Hyun Woo;
- Wang, Mingxun;
- Leber, Christopher A;
- Nothias, Louis-Félix;
- Reher, Raphael;
- Bin Kang, Kyo;
- van der Hooft, Justin JJ;
- Dorrestein, Pieter C;
- Gerwick, William H;
- Cottrell, Garrison W
Computational approaches such as genome and metabolome mining are becoming essential to natural products (NPs) research. Consequently, a need exists for an automated structure-type classification system to handle the massive amounts of data appearing for NP structures. An ideal semantic ontology for the classification of NPs should go beyond the simple presence/absence of chemical substructures, but also include the taxonomy of the producing organism, the nature of the biosynthetic pathway, and/or their biological properties. Thus, a holistic and automatic NP classification framework could have considerable value to comprehensively navigate the relatedness of NPs, and especially so when analyzing large numbers of NPs. Here, we introduce NPClassifier, a deep-learning tool for the automated structural classification of NPs from their counted Morgan fingerprints. NPClassifier is expected to accelerate and enhance NP discovery by linking NP structures to their underlying properties.