- Main
Challenges and Methods for Alignment of Large Language Models with Human Preferences
- Dang, John Arthur MinhQuan
- Advisor(s): Grover, Aditya
Abstract
While recent advances in methods for aligning Large Language Models (LLMs) have enabled incredible advances in capabilities of widely deployed AI systems, significant challenges still remain in instilling import human values in AI systems including, but not limited to helpful- ness, harmlessness, and fairness. We conduct a study on the impact of feedback acquisition protocols on downstream alignment and evaluation of LLMs. We find that different feed- back acquisition protocols can often produce feedback data which is inconsistent and that feedback data used during alignment biases evaluation of LLMs depending on the feedback acquisition protocol during evaluation. Finally, we introduce Group Preference Optimiza- tion (GPO), a novel method for few-shot aligning LLMs to preferences from groups. We experimentally validate GPO, showing that our method enables efficient alignment of LLM responses to preferences of various demographic groups, given a small amount of preference data from that group.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-