Search

Scholarly Works (5 results)

Sort By:

Thesis
Peer Reviewed

Improving and Understanding Self-Supervised Learning Algorithms

Tejankar, Ajinkya Baban
Advisor(s): Pirsiavash, Hamed

UC Davis Electronic Theses and Dissertations (2024)

The success of deep neural networks in the past decade was founded on supervised learning, but as the model and dataset sizes have grown so has the desire to break away from the costly process of human annotations, a requirement in supervised learning. In addition to the cost of data labeling, the human annotation process can be ambiguous, prone to biases, and involve privacy concerns (e.g. medical imaging). Here, self-supervised learning offers a path forward where models can be trained on cheap and abundantly available unlabeled data.

Improving Self-Supervised Learning. The idea behind self-supervised learning (SSL) is to use the training pipeline of supervised learning, but since there are no labels, replace the supervised task with a "pretext" task that is derived from the data itself. A well designed pretext task induces the model to learn a feature representation of the data that is useful for downstream tasks. Since the performance of an SSL model heavily relies on the pretext task used, one of my research objectives is to improve their design. Specifically, my contributions are as follows. 1) Traditional SSL pretext tasks are less effective for smaller capacity models than larger capacity models. Hence, I developed a better pretext task for smaller models. 2) Contrastive learning, a popular pretext task in SSL, suffers from the problem of treating semantically similar images as dissimilar. Hence, I developed a pretext task to fix this problem. 3) Clustering based SSL pretext tasks also suffer from the problem of incorrect negatives in addition to imposing unnecessary priors on the shape and size of clusters. Hence, I developed a mean-shift clustering pretext task that fixes these problems. 4) While an improvement over previous clustering methods, the mean-shift pretext task does not cluster semantically diverse samples. Hence, I developed a constrained mean-shift clustering pretext task that clusters semantically relevant yet far away samples.

Understanding Self-Supervised Learning. When scaled to large datasets, SSL models have been shown to learn rich generalizable features in both Natural Language Processing and Computer Vision. The idea is so powerful that for most applications these days, the default first step is to load a self-supervised model and then either use it in a few-shot setting or fine-tune it for a specific application. Hence, in addition to improving SSL models, the other objective of my research is to understand their inner working. I have made following contributions towards this objective. 1) SSL models are vulnerable to a class of adversarial attacks called "backdoor attacks". If an attacker can hijack the data collection pipeline, then they can alter the behavior of the model such that it fails in the presence of an attacker chosen backdoor. I analyzed the mechanism through which backdoors affect SSL models and used the insights to develop a defense for the attack. 2) Backdoor attacks are possible since SSL models learn shortcut features present in the dataset. Given that the scope of SSL models extends beyond computer vision, I was interested in understanding the types of shortcuts exploited by the language component in vision-language contrastive models. I showed that the model ignores grammatical structure of the language and simply uses language as a bag-of-words (BoWs).

Cover page: Improving and Understanding Self-Supervised Learning Algorithms

Thesis
Peer Reviewed

Improving Efficiency of Deep Learning Models

Abbasi Koohpayegani, Soroush
Advisor(s): Pirsiavash, Hamed

UC Davis Electronic Theses and Dissertations (2024)

Deep learning has revolutionized problem-solving by leveraging the power of deep neural networks. AlexNet and ImageNet marked a significant milestone, demonstrating the immense potential of scaling both data and computational resources to enhance model performance. This trend is particularly evident in neural language processing, where scaling Transformers has driven the development of the large language models (LLMs). Ultimately, data and computational power form the core foundations of deep learning’s success.

As models grow more complex, their demand for computational resources increases, leading to higher costs and energy consumption. These rising expenses are progressively limiting machine learning research to large industry labs. For instance, while many recent studies are open-sourced, the cost of reproducing them restricts AI research for most academic institutions. To address this, developing affordable, efficient AI models that are accessible to academic labs is a crucial step toward democratizing AI research. Achieving this will require optimizing models for both data efficiency and computational resource usage.

Moreover, Accessibility, affordability, and trustworthiness are crucial factors in the development of AI models. However, many deep learning models are designed for high-end, expensive hardware, limiting their broader adoption. Additionally, reliance on centralized computing raises significant privacy concerns, as user data must be transferred to remote servers for processing, diminishes trust in AI systems. Edge computing offers a promising alternative by processing data locally on devices, making it more cost-effective, energy-efficient, and enhancing both accessibility and trust. Ideally, AI models should be efficient and optimized for edge devices, reducing dependency on centralized systems.

These motivations inspire me to explore new approaches for enhancing the efficiency of deep learning models. My research focuses on various aspects of efficiency, including data efficiency, parameter efficiency, training compute efficiency, and inference efficiency. By prioritizing efficiency, I aim to bridge the gap between cutting-edge research and deployment of these models in real-world applications, while also fostering diversity in AI development. Ultimately, my goal is to make AI inclusive and accessible to all. I believe that meaningful progress builds on the contributions of many past works, making it crucial to expand access to AI for a broader range of researchers and developers, thereby accelerating advancements in the field.

Cover page: Improving Efficiency of Deep Learning Models

Thesis
Peer Reviewed

Efficiency in Computer Vision: From Compute and Memory to Robustness

UC Davis Electronic Theses and Dissertations (2024)

Deep learning has been the biggest success story in the past decade. It is now part of nearly every facet of our lives. Along with growth in popularity, deep learning has also seen growth in terms of model and data sizes and the scale of their training. The introduction of transformer architecture a few years ago has proved to be an inflection point and foundation models have taken off since then. The large vision and language models of today contain hundreds of billions of parameters, are trained on trillions of tokens on thousands of GPUs. They are also being deployed at a scale never seen before, including in real-time and safety critical applications. Very soon, each person could have their own customized LLM model to act as their virtual self.

The big improvements in performance of these models also comes with bigger demands in terms of energy, compute, memoryand resources both at the training and inference stages. The general purpose nature of foundation models also implies that they need to be continuously updated on new data. Thus, `training is a one and done process' cannot be a justification for their huge demands. The training and use of these models also comes with a huge carbon footprint, and can have adverse impact on the environment. Thus, there is both a huge opportunity and a need to develop more efficient models.

There are multiple perspectives to efficiency in deep learning, with compute/energy, speed, memory, data and hardware/resources being the most important ones. Their fates are often intertwined, for instance, smaller models require lesser data and computations and are thus faster and can be run on less expensive hardware. The scale of models can also be a limiting factor in many real-time applications. Making them nimble can unlock a host of new applications.

My goal here is to design and develop such efficient deep learning systems, primarily for computer vision applications, increasing their positive impacts and accessibility. My focus will be on the compute and memory efficiency of models with an eye on their robustness and reliability. I present solutions that reduce the training time and hardware requirements of self-supervised representation learning methods of vision and an easy way to distill them to smaller networks. On the memory front, my work includes a way to effectively fine-tune large vision and language models for a specific downstream task with a tiny fraction of the original model as additional parameters. Ideas from fine-tuning can also be easily adopted in active learning and is part of my future work. Similarly, I also target a recent groundbreaking approach for novel view synthesis, reducing its memory bottleneck by compressing it. I intend to extend these works to similarly improve the training and inference times of diffusion models for image generation. While most of my research is on making methods more efficient, it is also necessary to stop and analyse their robustness. My work on adversarial attack on efficient transformers opens up new avenues for research to try and develop better attacks and defenses that target the efficiency of these models. I hope that my work contributes to democratizing AI and has a net positive impact on the planet.

Cover page: Efficiency in Computer Vision: From Compute and Memory to Robustness

Article
Peer Reviewed

Surveying Nutrient Assessment with Photographs of Meals (SNAPMe): A Benchmark Dataset of Food Photos for Dietary Assessment.

UC Davis Previously Published Works (2023)

Photo-based dietary assessment is becoming more feasible as artificial intelligence methods improve. However, advancement of these methods for dietary assessment in research settings has been hindered by the lack of an appropriate dataset against which to benchmark algorithm performance. We conducted the Surveying Nutrient Assessment with Photographs of Meals (SNAPMe) study (ClinicalTrials ID: NCT05008653) to pair meal photographs with traditional food records. Participants were recruited nationally, and 110 enrollment meetings were completed via web-based video conferencing. Participants uploaded and annotated their meal photos using a mobile phone app called Bitesnap and completed food records using the Automated Self-Administered 24-h Dietary Assessment Tool (ASA24®) version 2020. Participants included photos before and after eating non-packaged and multi-serving packaged meals, as well as photos of the front and ingredient labels for single-serving packaged foods. The SNAPMe Database (DB) contains 3311 unique food photos linked with 275 ASA24 food records from 95 participants who photographed all foods consumed and recorded food records in parallel for up to 3 study days each. The use of the SNAPMe DB to evaluate ingredient prediction demonstrated that the publicly available algorithms FB Inverse Cooking and Im2Recipe performed poorly, especially for single-ingredient foods and beverages. Correlations between nutrient estimates common to the Bitesnap and ASA24 dietary assessment tools indicated a range in predictive capacity across nutrients (cholesterol, adjusted R2 = 0.85, p < 0.0001; food folate, adjusted R2 = 0.21, p < 0.05). SNAPMe DB is a publicly available benchmark for photo-based dietary assessment in nutrition research. Its demonstrated utility suggested areas of needed improvement, especially the prediction of single-ingredient foods and beverages.

Cover page: Surveying Nutrient Assessment with Photographs of Meals (SNAPMe): A Benchmark Dataset of Food Photos for Dietary Assessment.

Creative Commons 'BY' version 4.0 license

Article
Peer Reviewed

Canonical correlation analysis of brain prefrontal activity measured by functional near infra-red spectroscopy (fNIRS) during a moral judgment task.

UC Davis Previously Published Works (2019)

Individuals differ in the extent to which they make decisions in different moral dilemmas. In this study, we investigated the relationship between functional brain activities during moral decision making and psychopathic personality traits in a healthy population. We measured the hemodynamic activities of the brain by functional near-infrared spectroscopy (fNIRS). FNIRS is an evolving non-invasive neuroimaging modality which is relatively inexpensive, patient friendly and robust to subject movement. Psychopathic traits were evaluated through a self-report questionnaire called the Psychopathic Personality Inventory Revised (PPI-R). We recorded functional brain activities of 30 healthy subjects while they performed a moral judgment (MJ) task. Regularized canonical correlation analysis (R-CCA) was applied to find the relationships between activation in different regions of prefrontal cortex (PFC) and the core psychopathic traits. Our results showed a significant canonical correlation between PFC activation and PPI-R content scale (PPI-R-CS). Specifically, coldheartedness and carefree non-planfulness were the only PPI-R-CS factors that were highly correlated with PFC activation during personal (emotionally salient) MJ, while Machiavellian egocentricity, rebellious nonconformity, coldheartedness, and carefree non-planfulness were the core traits that exhibited the same dynamics as PFC activation during impersonal (more logical) MJ. Furthermore, ventromedial prefrontal cortex (vmPFC) and left lateral PFC were the most positively correlated regions with PPI-R-CS traits during personal MJ, and the right vmPFC and right lateral PFC in impersonal MJ.

Cover page: Canonical correlation analysis of brain prefrontal activity measured by functional near infra-red spectroscopy (fNIRS) during a moral judgment task.