As genomic repositories increasingly grow with a variety of data from a multitude of organisms, the need to approach extracting and interpreting data also becomes increasingly difficult. Recent advances in protein annotation and structure prediction have improved, however the variety and sheer amount of data requires unique approaches from multiple different disciplines. Bioinformatics yields important functional sequence information and classification. Molecular dynamics (MD) simulation allows for the interrogation of biochemical systems at the atomistic level. Combined with machine learning, these disciplines can be equipped to investigate the complex functions and relationships of proteins within the current abundant genomic landscape.
The objective of this dissertation is to outline complementary methodologies from various fields - bioinformatics, molecular dynamics simulation, and machine learning - that together, can investigate vast genomic repositories, functional protein data.
Aim 1: The development of the bioinformatics and in silico maturation pipeline consists of gene annotation, MD simulation to equilibrate predicted proteins, and statistical methods adopted from graph theory in collaboration with the Butts lab. Proteins can be represented in graph theoretic terms allowing for the exploration of diverse protein structural features.
Aim 2: Molecular dynamics simulation gives rise to atomic level details of complex systems. A variety of protein systems - HIV Rev, short intrinsically disordered peptides, STXPB4, YAP-1 WW domain - explored are intrinsically disordered. MD simulations were used to simulate the complexities and difficulties encountered within these proteins as well as plant metabolic proteins.
Aim 3: After the aforementioned bioinformatics pipeline and \textit{in silico} molecular dynamics-based maturation of predicted proteins, methods to extract useful atomistic information from coarse protein structure networks (PSNs) were developed. A multi-layer perceptron was used to essentially upscale coarse PSNs into atomistic models. The significance of this technique permits for the simulation of coarse PSNs, and the exploration of complex protein structural conformations.