Search

Article
Peer Reviewed

A machine learning approach to predict metabolic pathway dynamics from time-series multiomics data

LBL Publications (2018)

New synthetic biology capabilities hold the promise of dramatically improving our ability to engineer biological systems. However, a fundamental hurdle in realizing this potential is our inability to accurately predict biological behavior after modifying the corresponding genotype. Kinetic models have traditionally been used to predict pathway dynamics in bioengineered systems, but they take significant time to develop, and rely heavily on domain expertise. Here, we show that the combination of machine learning and abundant multiomics data (proteomics and metabolomics) can be used to effectively predict pathway dynamics in an automated fashion. The new method outperforms a classical kinetic model, and produces qualitative and quantitative predictions that can be used to productively guide bioengineering efforts. This method systematically leverages arbitrary amounts of new data to improve predictions, and does not assume any particular interactions, but rather implicitly chooses the most predictive ones.

Article
Peer Reviewed

Workflow Automation in Liquid Chromatography Mass Spectrometry

UC Davis Previously Published Works (2019)

We describe the fully automated workflow path developed for the ingest and analysis of liquid chromatography mass spectrometry (LCMS) data. With the help of this computational workflow, we were able to replace two human work days to analyze data with two hours of unsupervised computation time. In addition, this tool also can compute confidence intervals for all its results, based on the noise level present in the data. We leverage only open source tools and libraries in this workflow.

Cover page: Workflow Automation in Liquid Chromatography Mass Spectrometry

Article
Peer Reviewed

Flux analysis of central metabolic pathways in the Fe (III)-reducing organism Geobacter metallireducens via 13C isotopic labeling

LBL Publications (2007)

We analyzed the carbon fluxes in the central metabolism of Geobacter metallireducens strain GS-15 using 13C isotopomer modeling. Acetate labeled in the 1st or 2nd position was the sole carbon source, and Fe-NTA was the sole terminal electron acceptor. The measured labeled acetate uptake rate was 21 mmol/gdw/h in the exponential growth phase. The resulting isotope labeling pattern of amino acids allowed an accurate determination of the in vivo global metabolic reaction rates (fluxes) through the central metabolic pathways using a computational isotopomer model. The model indicated that over 90 percent of the acetate was completely oxidized to CO2 via a complete tricarboxylic acid (TCA) cycle while reducing iron. Pyruvate carboxylase and phosphoenolpyruvate carboxykinase were present under these conditions, but enzymes in the glyoxylate shunt and malic enzyme were absent. Gluconeogenesis and the pentose phosphate pathway were mainly employed for biosynthesis and accounted for less than 3 percent of total carbon consumption. The model also indicated surprisingly high reversibility in the reaction between oxoglutarate and succinate. This step operates close to the thermodynamic equilibrium possibly because succinate is synthesized via a transferase reaction, and its product, acetyl-CoA, inhibits the conversion of oxoglutarate to succinate. These findings enable a better understanding of the relationship between genome annotation and extant metabolic pathways in G. metallireducens.

Cover page: Flux analysis of central metabolic pathways in the Fe (III)-reducing organism Geobacter
metallireducens via 13C isotopic labeling

Article
Peer Reviewed

Machine learning framework for assessment of microbial factory performance

LBL Publications (2019)

Metabolic models can estimate intrinsic product yields for microbial factories, but such frameworks struggle to predict cell performance (including product titer or rate) under suboptimal metabolism and complex bioprocess conditions. On the other hand, machine learning, complementary to metabolic modeling necessitates large amounts of data. Building such a database for metabolic engineering designs requires significant manpower and is prone to human errors and bias. We propose an approach to integrate data-driven methods with genome scale metabolic model for assessment of microbial bio-production (yield, titer and rate). Using engineered E. coli as an example, we manually extracted and curated a data set comprising about 1200 experimentally realized cell factories from ~100 papers. We furthermore augmented the key design features (e.g., genetic modifications and bioprocess variables) extracted from literature with additional features derived from running the genome-scale metabolic model iML1515 simulations with constraints that match the experimental data. Then, data augmentation and ensemble learning (e.g., support vector machines, gradient boosted trees, and neural networks in a stacked regressor model) are employed to alleviate the challenges of sparse, non-standardized, and incomplete data sets, while multiple correspondence analysis/principal component analysis are used to rank influential factors on bio-production. The hybrid framework demonstrates a reasonably high cross-validation accuracy for prediction of E.coli factory performance metrics under presumed bioprocess and pathway conditions (Pearson correlation coefficients between 0.8 and 0.93 on new data not seen by the model).

Cover page: Machine learning framework for assessment of microbial factory performance

Article

Metagenomic Bacterial Finishing at JGI

LBL Publications (2008)

Article

Increasing Mevalonate Production by Engineering the Metabolism of Escherichia coli

LBL Publications (2009)

Article
Peer Reviewed

MACAW: An Accessible Tool for Molecular Embedding and Inverse Molecular Design

LBL Publications (2022)

The growing capabilities of synthetic biology and organic chemistry demand tools to guide syntheses toward useful molecules. Here, we present Molecular AutoenCoding Auto-Workaround (MACAW), a tool that uses a novel approach to generate molecules predicted to meet a desired property specification (e.g., a binding affinity of 50 nM or an octane number of 90). MACAW describes molecules by embedding them into a smooth multidimensional numerical space, avoiding uninformative dimensions that previous methods often introduce. The coordinates in this embedding provide a natural choice of features for accurately predicting molecular properties, which we demonstrate with examples for cetane and octane numbers, flash points, and histamine H1 receptor binding affinity. The approach is computationally efficient and well-suited to the small- and medium-size datasets commonly used in biosciences. We showcase the utility of MACAW for virtual screening by identifying molecules with high predicted binding affinity to the histamine H1 receptor and limited affinity to the muscarinic M2 receptor, which are targets of medicinal relevance. Combining these predictive capabilities with a novel generative algorithm for molecules allows us to recommend molecules with a desired property value (i.e., inverse molecular design). We demonstrate this capability by recommending molecules with predicted octane numbers of 40, 80, and 120, which is an important characteristic of biofuels. Thus, MACAW augments classical retrosynthesis tools by providing recommendations for molecules on specification.

Cover page: MACAW: An Accessible Tool for Molecular Embedding and Inverse Molecular Design

Article

Metagenomics study of Enhanced Biological Phosphorus Removal (EBPR)

LBL Publications (2005)

Article

Flux Analysis of Central Metabolic Pathways in the Fe (III)-Reducing Organism Geobactor Metallireducens Via 13C Isotopic Labeling

LBL Publications (2007)

We analyzed the carbon fluxes in the central metabolism of Geobacter metallireducens strain GS-15 using 13C isotopomer modeling. Acetate labeled in the 1st or 2nd position was the sole carbon source, and Fe-NTA was the sole terminal electron acceptor. The measured labeled acetate uptake rate was 21 mmol/gdw/h in the exponential growth phase. The resulting isotope labeling pattern of amino acids allowed an accurate determination of the in vivo global metabolic reaction rates (fluxes) through the central metabolic pathways using a computational isotopomer model. The model indicated that over 90% of the acetate was completely oxidized to CO2 via a complete tricarboxylic acid (TCA) cycle while reducing iron. Pyruvate carboxylase and phosphoenolpyruvate carboxykinase were present under these conditions, but enzymes in the glyoxylate shunt and malic enzyme were absent. Gluconeogenesis and the pentose phosphate pathway were mainly employed for biosynthesis and accounted for less than 3% of total carbon consumption. The model also indicated surprisingly high reversibility in the reaction between oxoglutarate and succinate. This step operates close to the thermodynamic equilibrium possibly because succinate is synthesized via a transferase reaction, and its product, acetyl-CoA, inhibits the conversion of oxoglutarate to succinate. These findings enable a better understanding of the relationship between genome annotation and extant metabolic pathways in G. metallireducens.

Cover page: Flux Analysis of Central Metabolic Pathways in the Fe (III)-Reducing Organism Geobactor Metallireducens Via 13C Isotopic Labeling

Article

Metagenomic Finishing at the JGI

LBL Publications (2008)