In this dissertation, I present several strategies to leverage experimental data towards a quantitative understanding of small molecule bioactivity that can inform the discovery of drug candidates. I review two cheminformatics tools I co-developed, present a comprehensive cross-target analysis of large public small molecule bioactivity data, and present a mathematical model of leptin transport across the mammalian blood-brain barrier.
First I present ChemMine Tools, a web service which provides both programmable and interactive online interfaces to a diverse set of analysis tools useful for analyzing small molecule structural data. ChemMine Tools allows users to import a set of small molecule structures, compute pairwise compound similarities, search for similar compounds, cluster compounds by structure or physical properties, and compute physicochemical properties.
The second software tool is bioassayR, a software package for large scale cross-target analysis of small molecule bioactivity profiles. bioassayR systematically analyzes data from thousands of screening experiments to identify target selective drug candidates and druggable protein targets. By simultaneously leveraging data from both custom small molecule screening efforts and public databases, bioassayR helps identify regions of the genome and proteome accessible to small molecule probes, elucidate novel mechanisms of action for bioactive molecules, and predict off-target effects which currently lead to a high attrition rate in drug discovery efforts.
The systematic cross-target analysis of public bioactivity data uses the bioassayR tool to analyze data from PubChem BioAssay. This groups small molecules into three groups based on an increasing number of active targets in these data- highly selective, family selective, and promiscuous binding, and find that FDA approved drugs are strongly represented among the family selective group. I also show that the compound-target space can be organized into biclusters, where shared activity tends to occur across protein targets sharing common Molecular Function Gene Ontology (MF GO) terms.
The leptin transport kinetic model extends current mathematical models of receptor endocytosis to transcytosis, and behaves similar to the experimentally observed dynamics of this system. A computational model is provided which allows for in-silico perturbation, to predict the potential effects of pathological states, or therapeutic small molecules.