Li, Jiachen

Aligning Generative AI Models with Human-Defined Rewards

2025

Li, Jiachen
Advisor(s): Wang, William

Abstract

Generative AI models are advancing at an unprecedented pace, reshaping human society and revolutionizing the way people work. By pretraining on web-scale datasets, these models encode human knowledge into neural networks, achieving human-level performance across various tasks. Given their powerful capability, aligning the model output with human intent, safety considerations, and ethical values has become one of the most critical research areas. In this thesis, I focus on aligning diffusion-based visual generation models and robotic generative policies with human-defined rewards.

In Part I, I investigate developing text-to-image (T2I) and text-to-video (T2V) models that achieve both fast and human-preferred generation. To this end, I introduce the Reward-Guided Latent Consistency Distillation (RG-LCD) framework, which integrates reward feedback into the consistency distillation process. RG-LCD gives rise to a series of T2I and T2V models that achieve at least ten-fold inference acceleration while being preferred by humans over their teacher models. Notably, my T2V-Turbo and T2V-Turbo-v2 set state-of-the-art (SOTA) results on VBench, outperforming various proprietary video generation systems.

In Part II, I explore aligning robotic generative policies with human-defined rewards under limited offline data constraints. I introduce CFPI operators for offline reinforcement learning (RL) tasks and develop MIDAS, a framework for learning robotic manipulation policies capable of understanding multimodal prompts that interleave image and text learning in context.

Finally, I summarize the contributions of this dissertation and discuss my future research directions after completing my Ph.D. degree.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Santa Barbara

Aligning Generative AI Models with Human-Defined Rewards