Skip to main content
eScholarship
Open Access Publications from the University of California

UCSF

UC San Francisco Electronic Theses and Dissertations bannerUCSF

Methods For Multiscale Biological Representation Learning

Abstract

As unstructured, high-content biological data becomes more accessible, developing methods to extract meaningful and computationally tractable representations is essential for engineering, optimizing, and understanding biological systems. This field of research has come to be known as biological representation learning and is what connects the two bodies of work that are presented in this dissertation. In Chapter 2, I focus on the development of experimental and computational methods to enable high-content phenotypic screening for genetic modifiers of neuronal activity dynamics. This involved the development of a novel self-supervised model, termed Plexus, that efficiently learns patterns in neuronal activity dynamics to generate single-cell embeddings that capture both intrinsic excitability and network-level synaptic activity. We then show that combining this model architecture with a CRISPR interference (CRISPRi) compatible iPSC-derived neuron system enables the large-scale screening of genetic modifiers of neuronal activity. We then applied this system to study the disease biology of MAPT mutant-driven aberrant neuronal activity and found potential therapeutic candidates for follow-up study. In Chapter 3, I focus on leveraging protein language model (PLM) embeddings for learning sequence-to-function mappings in a low-data regime. More specifically, we leveraged a dataset of 175 orthologs and their associated enzyme kinetic parameters to assess the use of PLM embeddings for predicting the catalytic turnover rate or Kcat We then showed that a learnable Transformer-based aggregation architecture is the most performant sequence-to-Kcat prediction method, which outperforms specialized deep-learning models trained on larger datasets. All together, this work highlights method advancements in machine learning for generating meaningful representations of biological systems.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View