Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Challenges and Methods for Alignment of Large Language Models with Human Preferences

Abstract

While recent advances in methods for aligning Large Language Models (LLMs) have enabled incredible advances in capabilities of widely deployed AI systems, significant challenges still remain in instilling import human values in AI systems including, but not limited to helpful- ness, harmlessness, and fairness. We conduct a study on the impact of feedback acquisition protocols on downstream alignment and evaluation of LLMs. We find that different feed- back acquisition protocols can often produce feedback data which is inconsistent and that feedback data used during alignment biases evaluation of LLMs depending on the feedback acquisition protocol during evaluation. Finally, we introduce Group Preference Optimiza- tion (GPO), a novel method for few-shot aligning LLMs to preferences from groups. We experimentally validate GPO, showing that our method enables efficient alignment of LLM responses to preferences of various demographic groups, given a small amount of preference data from that group.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View