Recently, a subset of computer science called the Artificial General Intelligence (AGI) safety or AI alignment has surfaced. It aims to develop techniques to make sure powerful systems will work according to our needs. This comes after AGI has been widely touted to behave and work like humans and to one day have human intelligence.
AGI safety research is driven by building AI agents that are smarter than human beings and go after objectives that dispute with our own. Basically, human intelligence helps us coordinate complex societies using or integrating technology to control the world more than any other species. In the future, AI will be more capable than us to successfully carry out these tasks and wield that control. Or else, AI will be Earth’s most powerful race; meanwhile, we humans will lose the capacity to build a valuable future.
Large-scale research and development in AGI will lead to singularity (as an end goal), which might lead to existential risks. If singularity is not achieved, rapid growth in AGI could bring about problems like revolutionised warfare, social manipulation, and shifts in power dynamics. And this is how AGI safety was born.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
What is AGI safety?
As discussed earlier, systems more intellectually developed than humans will be a reality, i.e. superintelligent, meanwhile, these AI systems will be autonomous. Hence, AGI safety or “alignment” has come up to keep AGI in line with our (human) ambitions. For example, Berkeley professor Stuart Russell proposed in his book Human Compatible that AI systems are designed to meet the objective of maximising human preference. These “preferences” cite what we care about in the future. Similarly, Iason Gabriel, an AI researcher at DeepMind, argues that AI should be aligned with principles supported by a global consensus and affirmed via democratic processes.
Another prominent personality in the AGI circuit, Eliezer Yudkowsky, founder of the Machine Intelligence Research Institute (MIRI), a non-profit research institute that identifies and manages existential risks from AGI, said that the goal of fulfilling human’s coherent extrapolated volition (CEV) are the values shared at a reflective equilibrium (idealised process of refinement). Today, experimental-aligned AI is pragmatic and can carry out preferential tasks, albeit without fully understanding long-term goals. This alignment applies to AI with general and specialised abilities.
AGI safety helps perform “novel” tasks, objective functions, and enable AGI systems according to our cognition. Replication, cultural learning, and recursive improvement between AGI systems can help them become human-level AGI from being superintelligent. However, existing frameworks and goal-directed agencies will not be able to predict what goals AGI might have. In order to come through, the cognitive capacities of AGIs should be able to pursue goals and overall development. The inner misalignment arises when convergent subgoals are present during training, and its complexity is compared to the outer objective. We must lay out a plan for building “intent” aligned AGIs which outshine humans at safety and governance research. Until then, retaining control through coordination will be critical to installing transparent systems.
The Secretary-General of the United Nations (UN) in 2021 advised regulating AI to be “aligned with shared global values”. In the same year, the People’s Republic of China (PRC) rolled out ethical guidelines for using AI in the country. As per the guidelines, the AI systems must abide by shared human values, be under human control, and not risk public safety. The UK came along and published a 10-year National AI strategy which implied that the British government takes the long term risk of non-aligned Artificial General Intelligence seriously.
AGI safety protocols and issues
Specification in AGI systems defines a system’s goal and makes sure it aligns with the human developer’s intentions and motives. These systems follow a pre-specified algorithm that allows them to learn from data, which helps them to achieve a specific goal. Meanwhile, both the learning algorithm and the goal are given by the human designer—for example, goals like minimising a prediction error or maximising a reward. During training, the system will try to complete the objective, irrespective of how it reflects on the designer’s intent. Hence, designers should take special care and clarify an objective that will lead to the desired or optimal behaviour.
If the goal is a poor proxy for the intended behaviour, the system will learn the wrong behaviour and consider it as “misspecified.” This is a likely outcome where the specified goal does not align with the desired behaviour.
In order to adhere to AGI safety, the system designer must understand why it behaves the way it does and will it ever align with that of the designer. A robust set of assurance techniques has already existed in old-gen systems. But, they are poorly suited to modern machine learning systems like deep neural networks. In addition, interpretability (also sometimes called explainability) can help us understand a machine’s decision-making and also learn to build systems that are easier to understand or interpretable. In this way, human operators can guarantee a system works as intended and will receive an explanation during unexpected behaviour.
Other issues such as reward corruption, reward gaming, and negative side effects have become sub-criteria of problems in AGI as depicted by DeepMind and OpenAI agendas. The biggest issue is how do we create an agent that pursues the goals we have designed? MIRI explains an escape route called value specification, which involves decision theory and logical omniscience.
As we move closer and closer towards creating a human-level intelligence, corrigibility or, in fact, self-corrigibility of the agent must be taken into account. Any mishap during its construction might lead to technical glitches and malfunction. Hence, it is yet to be seen if the agent will work along with the humans to fix the problem (error-tolerant design). Another relevant issue is whether humans can relate to the agent’s choices. Even if we are successful in creating the world’s first human-level AGI, it does not guarantee that we humans will relate to it.
Apart from the technicalities and scientific concerns, one of the biggest hindrances to human-level intelligence are societal consequences or decisions. AGI must be able to tackle sizable legal, economic, political, and military scenarios. DeepMind is also working towards developing a human-conscience agent with respect to AGI. AGI safety procedures must precede the deployment in high-stakes settings. Meanwhile, robustness, assurance, and specification are important parts of AI safety that help create or use reliable and safe AGI systems.