Velasquez, Erick Francisco

Computational Methods for Studying Protein-Protein Interaction and Association Experiments

2021

Velasquez, Erick Francisco
Advisor(s): Torres, Jorge

Abstract

The elucidation of a protein’s interaction/association network is important for defining itsbiological function. Mass spectrometry-based proteomic approaches have emerged as powerful tools for identifying protein-protein interactions (PPIs) and protein-protein associations (PPAs). However, interactome/association experiments are difficult to interpret considering the complexity and abundance of data that is generated. Although tools have been developed to quantitatively identify protein interactions/associations, there is still a pressing need for easy-to-use tools that allow users to contextualize their results.

To address this, we developed CANVS, a computational pipeline that cleans, analyzes,and visualizes mass spectrometry-based interactome/association data. CANVS is wrapped as an interactive Shiny dashboard, allowing users to easily interface with the pipeline. With simple requirements, users can analyze complex experimental data and create PPI/A networks. The application integrates systems biology databases like BioGRID and CORUM to contextualize the results. Furthermore, CANVS features a Gene Ontology tool that allows users to identify relevant GO terms in their results and create visual networks with proteins associated with relevant GO terms. As examples, we recently used the analytical framework included in CANVS to study the PPI/A networks of DUSP7, which helped to define its regulation of ERK2 during mitosis and also to analyze the PPA networks of core spindle assembly checkpoint proteins. Overall, CANVS is an easy-to-use application that benefits all researchers, especially those who lack an established bioinformatic pipeline and are interested in studying interactome/association data.

Additionally, we describe a supervised machine learning method that incorporatesannotated data from the contaminant repository for affinity purification data (CRAPome) that predicts contaminants in affinity and proximity purification data. The method involves first calculating amino acid content, sequence order, hydrophobicity and hydrophilicity from protein sequence. Then balancing data using data augmentation methods. Finally, measuring precision and accuracy using protein-protein interaction/association data. The results suggest that our supervised method can predict with 90% accuracy contaminants in protein-protein interaction/association data.

Main Content

For improved accessibility of PDF content, download the file to your device.

UCLA

Computational Methods for Studying Protein-Protein Interaction and Association Experiments