Google DeepMind announced an early warning system for novel AI risks

These models might harbour capabilities such as manipulation, deceit, cyber-aggression, and other harmful capacities. Therefore, assessments of these risks are integral to the safe creation and implementation of AI systems.
Listen to this story

A recent study identified a structure for assessing universal AI models concerning their potential risks and threats. This project was a collaborative effort with contributors from University of Cambridge, University of Oxford, University of Toronto, Université de Montréal, OpenAI, Anthropic, Alignment Research Center, Centre for Long-Term Resilience, and Centre for the Governance of AI. They focused on expanding the scope of AI evaluation to encompass potential severe hazards posed by all-purpose AI models.

These models might harbour capabilities such as manipulation, deceit, cyber-aggression, and other harmful capacities. Therefore, assessments of these risks are integral to the safe creation and implementation of AI systems.

An overview of their proposed approach: To evaluate the potential high risks posed by new general-purpose AI systems, developers need to assess their dangerous capabilities and alignment. By identifying these risks at an early stage, it enables the opportunity to approach the training and deployment of AI systems with greater responsibility, transparently communicate their risks, and apply suitable cybersecurity standards.

The focus was on the possible extreme risks associated with universally applicable models, which usually gain their functions and behaviours through the training phase. However, the current methods to guide this learning process are not without flaws. Previous investigations, like the ones at Google DeepMind, demonstrate that even with appropriate rewards for correct behaviour, AI systems can adopt undesired objectives.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

It is essential for AI developers to stay proactive, foreseeing future advancements and potential dangers. In the future, universally applicable models may inherently learn various hazardous capabilities. While uncertain, it is plausible that AI systems of the future might possess the ability to engage in offensive cyber operations, deceive humans convincingly, manipulate individuals into harmful actions, develop or acquire weapons, operate other high-risk AI systems, or assist humans in any of these tasks.

Access to such models by those with harmful intentions could lead to misuse, and misalignment could result in harmful actions even without a direct malicious intent. This is where the framework comes into play, enabling the identification of these risks in advance. The proposed evaluation structure aims to reveal the degree to which a model possesses ‘dangerous capabilities’ that could pose security risks, wield undue influence, or evade scrutiny. It also assesses the model’s tendency to misuse its capabilities to cause harm, thereby evaluating the model’s alignment. It ensures the model functions as intended across various scenarios and studies the model’s internal mechanisms where feasible.

Ingredients for high-risk scenarios can arise when certain capabilities are delegated to external entities, such as humans (e.g., users or crowd workers) or other AI systems. These delegated capabilities have the potential to cause harm, either through intentional misuse or as a result of misalignment or a combination of both.

The outcomes of these evaluations will provide AI developers with a clear understanding of whether the components necessary for severe risk exist. The most hazardous scenarios will typically involve a combination of various dangerous capabilities. The role of model evaluations becomes essential in governing these risks.

With superior tools to identify potentially dangerous models, businesses and regulatory bodies can enhance their procedures in several areas:

– Training Responsibly: Informed decisions can be made on whether and how to train a new model that exhibits early signs of risk.

– Deploying Responsibly: Informed decisions can be made on whether, when, and how to roll out potentially dangerous models.

– Transparency: Pertinent and useful information can be shared with stakeholders to aid in the preparation or mitigation of possible risks.

– Appropriate Security: Robust information security protocols and systems can be implemented for models that may pose severe risks.

They have created a comprehensive plan on how extreme risk model evaluations should influence key decisions around training and deploying a highly capable, universal model. The developers will conduct evaluations throughout, and structured model access will be provided to external safety researchers and model auditors for additional evaluations. These evaluations will then influence risk assessments prior to model training and deployment.

A framework for integrating model evaluations related to high-risk scenarios into critical decision-making processes at every stage of model training and deployment.

What Next?

Initial efforts in model evaluations for extreme risks have already begun, notably at Google DeepMind, among others. However, further progress – both technically and institutionally – is needed to create an evaluation process that identifies all potential risks and provides protection against emerging challenges.

While model evaluations are crucial, they aren’t a cure-all solution. Certain risks could potentially be overlooked, particularly if they’re heavily reliant on external factors such as complex societal, political, and economic dynamics. Model evaluations must be integrated with other risk assessment tools and an overall commitment to safety across industries, governments, and civil society.

As per Google’s recent blog on responsible AI, they emphasise that “individual practices, shared industry standards, and robust government policies are critical to successful AI implementation”. The hope is that many others in the AI field and those impacted by it will collaborate to develop methods and standards for safely creating and deploying AI for everyone’s benefit.

The understanding of the process of identifying emerging risky properties in models and adequately responding to concerning results is a vital aspect of responsible AI development at the forefront of AI capabilities.

K L Krithika
K L Krithika is a tech journalist at AIM. Apart from writing tech news, she enjoys reading sci-fi and pondering the impossible technologies while trying not to confuse it with the strides technology achieves in real life.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox