Skip to main content
eScholarship
Open Access Publications from the University of California

Open Access Policy Deposits

This series is automatically populated with publications deposited by UC Irvine Donald Bren School of Information and Computer Sciences Department of Computer Science researchers in accordance with the University of California’s open access policies. For more information see Open Access Policy Deposits and the UC Publication Management System.
Cover page of Refinement and curation of homologous groups facilitated by structure prediction.

Refinement and curation of homologous groups facilitated by structure prediction.

(2025)

Domain classification of protein predictions released in the AlphaFold Database (AFDB) has been a recent focus of the Evolutionary Classification of protein Domains (ECOD). Although a primary focus of our recent work has been the partition and assignment of domains from these predictions, we here show how these diverse predictions can be used to examine the reference domain set more closely. Using results from DPAM, our AlphaFold-specific domain parsing algorithm, we examine hierarchical groupings that share significant levels of homologous links, both between groups that were not previously assessed to be definitively homologous and between groups that were not previously observed to share significant homologous links. Combined with manual analysis, these large datasets of structural and sequence similarities allow us to merge homologous groups in multiple cases which we detail within. These domains tend to be families of domains from families that are either small, previously had few experimental representatives, or had unknown function. The exception to this is the chromodomains, a large homologous group which were increased from possibly homologous to definitely homologous to increase the consistency of ECOD based their strong homologous links to the SH3 domains.

Cover page of Spatial profiling of the interplay between cell type- and vision-dependent transcriptomic programs in the visual cortex

Spatial profiling of the interplay between cell type- and vision-dependent transcriptomic programs in the visual cortex

(2025)

How early sensory experience during "critical periods" of postnatal life affects the organization of the mammalian neocortex at the resolution of neuronal cell types is poorly understood. We previously reported that the functional and molecular profiles of layer 2/3 (L2/3) cell types in the primary visual cortex (V1) are vision-dependent [S. Cheng et al., Cell 185, 311-327.e24 (2022)]. Here, we characterize the spatial organization of L2/3 cell types with and without visual experience. Spatial transcriptomic profiling based on 500 genes recapitulates the zonation of L2/3 cell types along the pial-ventricular axis in V1. By applying multitasking theory, we suggest that the spatial zonation of L2/3 cell types is linked to the continuous nature of their gene expression profiles, which can be represented as a 2D manifold bounded by three archetypal cell types. By comparing normally reared and dark reared L2/3 cells, we show that visual deprivation-induced transcriptomic changes comprise two independent gene programs. The first, induced specifically in the visual cortex, includes immediate-early genes and genes associated with metabolic processes. It manifests as a change in cell state that is orthogonal to cell-type-specific gene expression programs. By contrast, the second program impacts L2/3 cell-type identity, regulating a subset of cell-type-specific genes and shifting the distribution of cells within the L2/3 cell-type manifold. Through an integrated analysis of spatial transcriptomics with single-nucleus RNA-seq data, we describe how vision patterns cortical L2/3 cell types during the critical period.

Cover page of Chemically Informed Deep Learning for Interpretable Radical Reaction Prediction.

Chemically Informed Deep Learning for Interpretable Radical Reaction Prediction.

(2025)

Organic radical reactions are crucial in many areas of chemistry, including synthetic, biological, and atmospheric chemistry. We develop a predictive framework based on the interaction of molecular orbitals that operates on mechanistic-level radical reactions. Given our chemistry-aware model, all predictions are provided with different levels of interpretability. Our models are trained and evaluated using the RMechDB database of radical reaction steps. Our model predicts the correct orbital interaction and products for 96% of the test reactions in RMechDB. By chaining these predictions, we perform a pathway search capable of identifying all intermediates and byproducts of a radical reaction. We test the pathway search on two classes of problems in atmospheric and polymerization chemistry. RMechRP is publicly available online at https://deeprxn.ics.uci.edu/rmechrp/.

Cover page of Multimodal Pain Recognition in Postoperative Patients: Machine Learning Approach.

Multimodal Pain Recognition in Postoperative Patients: Machine Learning Approach.

(2025)

BACKGROUND: Acute pain management is critical in postoperative care, especially in vulnerable patient populations that may be unable to self-report pain levels effectively. Current methods of pain assessment often rely on subjective patient reports or behavioral pain observation tools, which can lead to inconsistencies in pain management. Multimodal pain assessment, integrating physiological and behavioral data, presents an opportunity to create more objective and accurate pain measurement systems. However, most previous work has focused on healthy subjects in controlled environments, with limited attention to real-world postoperative pain scenarios. This gap necessitates the development of robust, multimodal approaches capable of addressing the unique challenges associated with assessing pain in clinical settings, where factors like motion artifacts, imbalanced label distribution, and sparse data further complicate pain monitoring. OBJECTIVE: This study aimed to develop and evaluate a multimodal machine learning-based framework for the objective assessment of pain in postoperative patients in real clinical settings using biosignals such as electrocardiogram, electromyogram, electrodermal activity, and respiration rate (RR) signals. METHODS: The iHurt study was conducted on 25 postoperative patients at the University of California, Irvine Medical Center. The study captured multimodal biosignals during light physical activities, with concurrent self-reported pain levels using the Numerical Rating Scale. Data preprocessing involved noise filtering, feature extraction, and combining handcrafted and automatic features through convolutional and long-short-term memory autoencoders. Machine learning classifiers, including support vector machine, random forest, adaptive boosting, and k-nearest neighbors, were trained using weak supervision and minority oversampling to handle sparse and imbalanced pain labels. Pain levels were categorized into baseline and 3 levels of pain intensity (1-3). RESULTS: The multimodal pain recognition models achieved an average balanced accuracy of over 80% across the different pain levels. RR models consistently outperformed other single modalities, particularly for lower pain intensities, while facial muscle activity (electromyogram) was most effective for distinguishing higher pain intensities. Although single-modality models, especially RR, generally provided higher performance compared to multimodal approaches, our multimodal framework still delivered results that surpassed most previous works in terms of overall accuracy. CONCLUSIONS: This study presents a novel, multimodal machine learning framework for objective pain recognition in postoperative patients. The results highlight the potential of integrating multiple biosignal modalities for more accurate pain assessment, with particular value in real-world clinical settings.

Cover page of Nicotinamide mononucleotide restores impaired metabolism, endothelial cell proliferation and angiogenesis in old sedentary male mice.

Nicotinamide mononucleotide restores impaired metabolism, endothelial cell proliferation and angiogenesis in old sedentary male mice.

(2025)

Aging is accompanied by a decline in neovascularization potential and increased susceptibility to ischemic injury. Here, we confirm the age-related impaired neovascularization following ischemic leg injury and impaired angiogenesis. The age-related deficits in angiogenesis arose primarily from diminished EC proliferation capacity, but not migration or VEGF sensitivity. Aged EC harvested from the mouse skeletal muscle displayed a pro-angiogenic gene expression phenotype, along with considerable changes in metabolic genes. Metabolomics analysis and 13C glucose tracing revealed impaired ATP production and blockade in glycolysis and TCA cycle in late passage HUVECs, which occurred at nicotinamide adenine dinucleotide (NAD⁺)-dependent steps, along with NAD+ depletion. Supplementation with nicotinamide mononucleotide (NMN), a precursor of NAD⁺, enhances late-passage EC proliferation and sprouting angiogenesis from aged mice aortas. Taken together, our study illustrates the importance of NAD+-dependent metabolism in the maintenance of EC proliferation capacity with age, and the therapeutic potential of NAD precursors.

Cover page of Evaluation of LLMs accuracy and consistency in the registered dietitian exam through prompt engineering and knowledge retrieval.

Evaluation of LLMs accuracy and consistency in the registered dietitian exam through prompt engineering and knowledge retrieval.

(2025)

Large language models (LLMs) are fundamentally transforming human-facing applications in the health and well-being domains: boosting patient engagement, accelerating clinical decision-making, and facilitating medical education. Although state-of-the-art LLMs have shown superior performance in several conversational applications, evaluations within nutrition and diet applications are still insufficient. In this paper, we propose to employ the Registered Dietitian (RD) exam to conduct a standard and comprehensive evaluation of state-of-the-art LLMs, GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro, assessing both accuracy and consistency in nutrition queries. Our evaluation includes 1050 RD exam questions encompassing several nutrition topics and proficiency levels. In addition, for the first time, we examine the impact of Zero-Shot (ZS), Chain of Thought (CoT), Chain of Thought with Self Consistency (CoT-SC), and Retrieval Augmented Prompting (RAP) on both accuracy and consistency of the responses. Our findings revealed that while these LLMs obtained acceptable overall performance, their results varied considerably with different prompts and question domains. GPT-4o with CoT-SC prompting outperformed the other approaches, whereas Gemini 1.5 Pro with ZS recorded the highest consistency. For GPT-4o and Claude 3.5, CoT improved the accuracy, and CoT-SC improved both accuracy and consistency. RAP was particularly effective for GPT-4o to answer Expert level questions. Consequently, choosing the appropriate LLM and prompting technique, tailored to the proficiency level and specific domain, can mitigate errors and potential risks in diet and nutrition chatbots.

Cover page of ECOD: integrating classifications of protein domains from experimental and predicted structures.

ECOD: integrating classifications of protein domains from experimental and predicted structures.

(2025)

The evolutionary classification of protein domains (ECOD) classifies protein domains using a combination of sequence and structural data (http://prodata.swmed.edu/ecod). Here we present the culmination of our previous efforts at classifying domains from predicted structures, principally from the AlphaFold Database (AFDB), by integrating these domains with our existing classification of PDB structures. This combined classification includes both domains from our previous, purely experimental, classification of domains as well as domains from our provisional classification of 48 proteomes in AFDB predicted from model organisms and organisms of concern to global health. ECOD classifies over 1.8 M domains from over 1000 000 proteins collectively deposited in the PDB and AFDB. Additionally, we have changed the F-group classification reference used for ECOD, deprecating our original ECODf library and instead relying on direct collaboration with the Pfam sequence family database to inform our classification. Pfam provides similar coverage of ECOD with family classification while being more accurate and less redundant. By eliminating duplication of effort, we can improve both classifications. Finally, we discuss the initial deployment of DrugDomain, a database of domain-ligand interactions, on ECOD and discuss future plans.

{Princ-wiki-a Mathematica}: Wikipedia Editing and Mathematics

(2025)

This essay incorporates with permission material from our pseudonymous colleague XOR'easter, who also contributed many suggestions during the writing process. By the extent of XOR’easter’s contributions, they would normally be credited as an author. However it was not possible in time to find a way to strictly preserve anonymity and assign legal copyright. All four contributors disagree with this exclusion. I regret its necessity — Ed.

Cover page of Benefit of Varying Navigation Strategies in Robot Teams

Benefit of Varying Navigation Strategies in Robot Teams

(2025)

Inspired by recent human studies, this paper investigates the benefits of employing varying navigation strategies in robot teams. We explore how mixed navigation strategies impact task completion time, environment exploration, and overall system effectiveness in multi-robot systems. Experiments were conducted in a simulated rectangular environment using Clearpath PR2 robots and evaluated different navigation strategies observed in humans: 1) Route (RT) knowledge where agents follow a predefined path, 2) Survey (SW) knowledge where agents take the shortest path while avoiding obstacles, 3) Mixed strategies with varying proportions, such as 40% RT and 60% SW (0.4RT 0.6SW) and 60% RT and 40% SW (0.6RT 0.4SW), and 4) An additional strategy where agents switch from RT to SW 10% of the time (0.9RT 0.1SW). While SW strategy is the most time-efficient, RT strategy covers more of the environment. Mixed strategies offer a balanced trade-off. These findings highlight the advantages of variability in navigation strategies, suggesting benefits in both biological and robotic populations. Additionally, we have observed that human participants in a similar study would start on a route, and then 10% of the time switch to survey. Therefore, we investigate a 90% Route 10% Survey (0.9RT 0.1SW) strategy for individual team members. While a pure Survey strategy is the most efficient regarding time taken and a pure Route strategy covers more of the environment, a mixture of strategies appears to be a beneficial tradeoff between time taken to complete a mission and area coverage. These results highlight the advantages of population variability, suggesting potential benefits in both biological and robotic populations.