Haas, Julia

Passing the Moral Turing Test

2020

Haas, Julia

Creative Commons 'BY' version 4.0 license

Abstract

The translation problem in moral AI asks how insights into human norms and values can be translated into a form suitablefor implementation in artificial systems. I argue that if my answer to a question about the human mind is right, thenthe translation problem is more tractable than previously thought. Specifically, I argue that we can use principles fromreinforcement learning to study human moral cognition, and that we can use principles from the resulting evaluative moralpsychology to design artificial systems capable of passing the Moral Turing Test (Allen, 2000). I illustrate the core featuresof my proposal by describing one such environment, or gridworld, in which an agent learns to trade-off between monetaryprofit and fair dealing, as characterized in behavioral economic paradigms. I conclude by highlighting the core technicaland philosophical advantages of such an approach for modeling moral cognition more broadly construed.

Main Content

For improved accessibility of PDF content, download the file to your device.

Proceedings of the Annual Meeting of the Cognitive Science Society

Passing the Moral Turing Test