In-vivo phenotypic screening in larval zebrafish has shown much promise for neuroactive drug discovery, but unleashing its full potential remains a challenge. How do we robustly quantify how drugs modulate zebrafish behavior, and in parallel, how do we unravel which targets or pathways they act by? We develop a phenotypic screening and computational pipeline to begin meeting these challenges. Starting with motion index (MI) as a readout for phenotype, and correlation distance (CD) as a measure of phenotypic similarity, we extend the similarity ensemble approach to computationally predict targets for sets of phenotypic screening hits. Using this approach, we predict an “antipsychotic” target profile for previously uncharacterized hit compounds with MI’s matching those of known antipsychotic compounds. For a novel phenotype associated with sedation and paradoxical excitation caused by anesthetics such as etomidate and propofol, we predict not only the canonical GABAergic pathway, but a novel target entirely; the serotonin-6 receptor, which we validate with both in-vitro and in-vivo experiments.
However, our initial attempts at extending this approach to other known drug classes such as stimulants and convulsants are met with unexpected challenges; we hypothesize that the MI signatures and the CD used to compare might not be robust enough for these more subtle phenotypes. And so the Deepfish project is born. We train Siamese Neural Networks (SNNs) on a highly replicated screen of 650 known neuroactive drugs to learn a custom distance metric for comparing MI. This new distance function scores higher than CD at the task of separating same-drug replicate pairs versus different-drug pairs, all while generalizing to a quality control screen done months prior. In that arena, the new distance metric gets higher classification accuracy on average, but also strikingly outperforms CD for 3 drugs with more subtle phenotypes.
Armed with a way of training robust distance metrics, we make progress on using unsupervised deep-learning approaches to find more robust representations of behavior. We discover that for computing similarities between these high-dimensional embedded fingerprints, training custom distance metrics is even more imperative. However, we see signs that overfitting is possible with the Siamese Networks on our highly-replicated dataset - both with the raw MI and high-dimensional embedded representations - so we design and perform a version of the screen with fully randomized drug layouts, which we will use to benchmark our methods in the near future.