Moral Foundational Characteristics of Large Language Models
- Simmons, Gabriel
- Advisor(s): Ghosal, Dipak
Abstract
Large Language Models (LLMs) have demonstrated impressive capability in generating fluent text. LLMs have also shown a tendency to reproduce social biases such as stereotypical associations between gender and occupation. Like race and gender, morality is an important social variable. This work investigates whether LLMs reproduce the moral biases associated with political groups in the United States, an instance of a broader capability I refer to as "moral mimicry". I explore this hypothesis in the GPT-3/3.5 and OPT families of Transformer-based LLMs. Using tools from Moral Foundations Theory, I show that these LLMs are indeed "moral mimics". When prompted with a "liberal" or "conservative" political identity, the models generate text reflecting the moral biases associated with these groups. I investigate how moral mimicry relates to model scale. I hope that this work encourages further investigation of the moral mimicry capability, including how to leverage it for social good and minimize its risks.