Simmons, Gabriel

This item is not available for download from eScholarship

Moral Foundational Characteristics of Large Language Models

2023

Simmons, Gabriel
Advisor(s): Ghosal, Dipak

No data is associated with this publication.

Abstract

Large Language Models (LLMs) have demonstrated impressive capability in generating fluent text. LLMs have also shown a tendency to reproduce social biases such as stereotypical associations between gender and occupation. Like race and gender, morality is an important social variable. This work investigates whether LLMs reproduce the moral biases associated with political groups in the United States, an instance of a broader capability I refer to as "moral mimicry". I explore this hypothesis in the GPT-3/3.5 and OPT families of Transformer-based LLMs. Using tools from Moral Foundations Theory, I show that these LLMs are indeed "moral mimics". When prompted with a "liberal" or "conservative" political identity, the models generate text reflecting the moral biases associated with these groups. I investigate how moral mimicry relates to model scale. I hope that this work encourages further investigation of the moral mimicry capability, including how to leverage it for social good and minimize its risks.

Main Content

UC Davis

Moral Foundational Characteristics of Large Language Models

This item is under embargo until May 15, 2026.