Skip to main content
eScholarship
Open Access Publications from the University of California

Passing the Moral Turing Test

Creative Commons 'BY' version 4.0 license
Abstract

The translation problem in moral AI asks how insights into human norms and values can be translated into a form suitablefor implementation in artificial systems. I argue that if my answer to a question about the human mind is right, thenthe translation problem is more tractable than previously thought. Specifically, I argue that we can use principles fromreinforcement learning to study human moral cognition, and that we can use principles from the resulting evaluative moralpsychology to design artificial systems capable of passing the Moral Turing Test (Allen, 2000). I illustrate the core featuresof my proposal by describing one such environment, or gridworld, in which an agent learns to trade-off between monetaryprofit and fair dealing, as characterized in behavioral economic paradigms. I conclude by highlighting the core technicaland philosophical advantages of such an approach for modeling moral cognition more broadly construed.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View