How To Build Human Friendly AI – Can AI Be Aligned With Human Values?

There is no bigger concern than algorithmic bias in daily operations. From trading on Wall Street to social security compensations and managing loan approvals – the use of AI has grown with applications that range from self-driving vehicles to AI-powered diagnostic tools and training datasets are highly susceptible to contain traces of discrimination which can lead to biased decisions. But what happens when bias rules the systems – according to LR. Varshney, Assistant Professor at the University of Illinois at Urbana-Champaign this discrimination, and unfairness present in algorithmic decision making in the field of AI has become one of the biggest concerns than even discrimination by people.

It is a view seconded by Nate Soares of Machine Intelligence Research Institute – with AI algorithms rivalling humans in scientific inference and planning, everyday more and more heavy computational jobs will be delegated to algorithms themselves. And on this path to a greater intelligence, much of the work may be done by smarter-than-human systems.

Can AI unshackle its source code?

Recent AI research has taken on a new slant – ensuring AI alignment with our goals and ensuring highly advanced AI systems are aligned with human interest. The problem of “aligning” a super intelligence to retain our values is the mainstay of forward-thinking research on AI. One of the most common threads that comes up in the argument is about advanced AI systems unshackling their source code and going rogue. Nate Soares who heads the Machine Intelligence Research Institute dismissed such dystopian fears by emphasizing that AI system is its own source code, and its actions will only follow from the execution of the instructions that we initiate.

The more serious question posed in this debate should be how can we ensure that objectives outlined for smarter-than-human AI are correct, and how we can minimize costly accidents and unintended consequences in cases of misspecification, he noted in his talk at Google.

Even famous computer scientist Stuart Russell in his book Artificial Intelligence: A Modern Approach followed the same approach stating that – primary concern in AI is not the emergent of consciousness but simply the ability to make high-quality decisions.

Let’s have a look at new developments in the field of AI Safety

There’s a lot going in the sphere of AI safety research and finding a full solution to the problem of aligning an artificial super-intelligence with human values. In a bid to generate interest amongst AI researchers and scientists, senior industry members are monetizing research papers to garner more interest for building friendly AI with human values.

MIRI outlined four areas of research which have been studied extensively and are relevant to alignment with human goals:

a) Building realistic world-models, the study of agents learning and pursuing goals while embedded within a physical world

b) Decision theory, the study of idealized decision-making procedures

c) Logical uncertainty, the study of reliable reasoning with bounded deductive capabilities

d) Vingean reflection, the study of reliable methods for reasoning about agents that are more intelligent than the reasoner (A self-modifying agent or any that constructs new agents more intelligent than itself must reason about the behavior of a system that is more intelligent than the reasoner)

Let’s have a look at recent advances in developing human-friendly AI

Snapshot of recent research in AI Safety R&D projects

AI Alignment Prize: Last year in November Zvi Mowshowitz, Vladimir Slepnev announced the $5000 AI Alignment prize for publicly posting work on advancing understanding of AI alignment, funded by Paul Christiano. The duo received 40+ entries in a span of a month and even announced six winners who are all set to received $15,000 in total, an increase from the originally planned $5,000. The team has kicked off a second round of research and entries are accepted till March 31.

Optimizing goal alignment problem: According to Boston-headquartered Future of Life Institute, researchers are working on the goal-alignment theory – and what sub-goals should we expect a super-intelligent AI to have? The basic argument is that an AI system should not strive to improve its capability of achieving its ultimate goals, but also to ensure that it will retain these goals even after it has become more advanced.

According to American AI researcher and theorist Eliezer Yudkowsky who popularized the idea of building self-improving AI systems said that if we manage to get our self-improving AI to become friendly by learning and adopting our goals, then we’re all set, because we’re guaranteed that it will try its best to remain friendly forever. His paper Complex Value Systems are Required to Realize Valuable Futures, Yudkowsky posits that if you one builds an AGI with a known utility function, and that AGI is sufficiently competent at self-modification, it should keep that utility function even as it improves its own intelligence, as  proposed by Jürgen Schmidhuber’s Gödel machine – an approach to AGI that uses a recursive self-improvement architecture.

German computer scientist Marcus Hutter

Hutter’s approach based on AIXI, pegged as gold standard for AGI: German computer scientist Marcus Hutter applied a mathematical top-down approach to AI and came up with a term called AIXI– a universal AIXI model which behaves optimally in any computable environment. Hutter constructed an algorithm AIXItl, which is superior to any other time t and space l bounded agent. The computation time of AIXItl is of the order t·2l. The constant 2l is still too large to allow a direct implementation. Through this approach, Hutter’s AIXI system tried to maximize the reward signal delivered along a sensory reward channel. According to Hutter’s assumption, AIXI can serve as a gold standard for achieving AGI, is a universally rational agent and is computationally intractable, which means the agent’s actions follow some unknown but computable probability distribution.

NIPS 2017 highlighted how AI Safety is gaining traction: Max Tegmark founded Future of Life Institute shared there were a slew of presentations on the long-term side, including oral & spotlight presentations and the Aligned AI workshop. There were also a bunch of Value-alignment papers discussed at the conference — mentioning a few here. Inverse Reward Design where the focus was ensuring how when designing the reward, ensure that the reward will lead to the right behavior in those scenarios. Another paper that caught the attention was Deep RL from Human Preferences where the authors proposed how this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent’s interactions with the environment.


As Nate Soares outlines in his paper – an intelligent agent must be designed to learn and act according to the preferences of its operators, this is known as the value learning problem, which means, intelligent agents should be constructed to inductively learn values from training data. The goal for AI researchers should be to develop advanced systems that can classify potential outcomes as per their value but what sort of training data allows this classification?

However, the biggest technical problem in this approach of inductive learning is how to provide a training dataset that enables the agent to learn the complexities of value?  According to Soares, for inductive value learning approach to succeed, it is a must to construct a system that identifies ambiguities in the training set—dimensions along which the training set gives no information—and queries the operators accordingly.

Download our Mobile App

Richa Bhatia
Richa Bhatia is a seasoned journalist with six-years experience in reportage and news coverage and has had stints at Times of India and The Indian Express. She is an avid reader, mum to a feisty two-year-old and loves writing about the next-gen technology that is shaping our world.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox