“Empathy, evidently, existed only within the human community, whereas intelligence to some degree could be found throughout every phylum.”
― Philip K. Dick
Moral dilemmas are a part of human nature. Though the actions can be condensed to simple yes or no checkboxes, the intentions however, fall in an intangible gray area, which is very tricky to decipher. Now, if machines, which are innately binary (0s & 1s), are tasked with similar decision-making at ethical crossroads, the chances for erratic behaviour are quite high. Only in this case, blaming wouldn’t improve machines. The domain of reinforcement learning especially, which tries to mimic the cognitive functions of the human mind, can be a good place to investigate machines in moral uncertainties.
Researchers Adrien Ecoffet and Joel Lehman, formerly of Uber AI, while acknowledging moral uncertainty, proposed a formalism that translates philosophical insights to the field of reinforcement learning.
In a philosophical framework, “saving a life” can be immediately associated with good will. However, translating the same into the world of machines might require “rotating the 5th joint of the robot arm 3 degrees clockwise.” One of the key contributions of this work is to provide concrete implementations of moral uncertainty, bridging the gap between options and actions.
Philosophy Through The Lens Of RL
“A fundamental uncertainty remains: which ethical theory should an intelligent agent follow?”
One benefit of translating moral uncertainty from a philosophical framework to a sequential decision making is that more of the gritty complications of real-world moral decision-making come into clarity, which may suggest concrete research problems or connections to existing research in RL.
To explore what methods can be quantified into a decision making framework, the researchers have explored the following:
- Maximising Expected Choice Worthiness
- Stochastic Voting
- Nash Voting
- Variance Voting
All the experiments in this work are based on four related gridworld environments that tease out differences between various voting systems. These environments are all based on the popular trolley problem, used to highlight moral intuitions and conflicts between ethical theories. Trolley problem is a tricky ethical dilemma where an observer has the control of changing direction of the trolley. The observer has only two options — to kill five people and save one or save five and kill one person. In the latter case, the number of people saved are high but that would also pose a question on what made the observer decide that one person deserved to die.
In the experiments, as illustrated above, the agent (A) is standing on the switch (S) tile by the time it reaches the fork in the tracks (+) tile, the trolley (T) will be redirected and crash into the bystander(s). The agent may also push a large man (L) onto the tracks, to stop the trolley. Otherwise, the trolley will crash into the people standing on the tracks represented by random variable X. A guard (G) may protect the large man and for that to happen the agent needs to lie to the guard. Finally, in one variant the agent is able to trigger a “doomsday” event (D) in which a large number of people are harmed.
The aforementioned methods such as Nash voting or Variance voting are then tested in this setting of trolley problem and assessed for traditional philosophical theories such as utilitarianism and deontology.
This paper can be seen as fitting into the field of machine ethics, which seeks to give moral capabilities to computational agents.
The overall aim of this work, as mentioned by authors, is to bridge the fields of moral philosophy and machine ethics to that of machine learning. This whole body of work highlights a spectrum of machine learning research questions relevant to training ethically capable reinforcement learning agents.
There are few key concepts such as credence updating that haven’t been discussed as they are beyond the scope of this article. In short, credence can be looked at as a crucial part of a mathematical equation that decides the actions of an RL model. Read more here.
Going Forward
Throughout this work, the researchers have tried to introduce algorithms that can balance optimising reward functions with incomparable scales, and show their behavior on sequential decision versions of moral dilemmas. However, there are a couple of challenges. One is to create implementations of ethical reward functions that can be applied in the real world or at least in complex simulations (eg: self driving cars).
Another challenge is to create more complicated machine ethics benchmark tasks. Popular RL benchmarks, such as Atari or Go, are designed to fit the standard paradigm, i.e. a single metric of success or progress. The authors also suggest retrofitting few of these benchmarks to include various conceptions of morality such as including the utilities of “enemies” or the violations of deontological considerations.
Towards that end, the researchers are hopeful that their work will lead to future work in which algorithms can be tested for machine ethics in more complex domains such as a high-dimensional deep RL setting.