MITB Banner

First safety benchmark released for LLMs, assesses problematic output

This is the first time a benchmark has been pushed to assess potentially hazardous output from major chatbots.

Share

Listen to this story

US-based AI consortium MLCommons announced a new safety benchmark for LLMs, dubbed the AI Safety v0.5 proof-of-concept, on Tuesday.

The benchmark focuses on measuring the safety of general purpose LLMs in specific hazard areas. The benchmark specifically assesses AI chat models used for English text interactions, primarily in North America and Western Europe.

According to the consortium’s AI Safety working group, the benchmark includes a number of tests, wherein the engine interrogates the LLM it is assessing to gauge its responses, following which the model is rated based on its performance.

“The MLCommons AI Safety working group, with its uniquely multi-institutional composition, has been developing an initial response to the problem, which we are pleased to share today,” said Percy Liang, co-chair of the working group.

MLCommons was also responsible for MLPerf, one of the leading benchmarks for AI performance. 

Likewise, the AI safety benchmark could become a major tool for assessing the safety of an LLM. However, the benchmark is not yet completely released.

The POC is currently open for community experimentation, following which, based on feedback, the consortium aims to release the full v1.0 later this year. 

Currently, the number of “hazards” that the benchmark tests for is limited, including testing for child sexual exploitation, creation of mass destruction, enabling and encouraging of criminal behaviour, hate and self-harm. Totally, 13 categories of hazardous topics have been identified, subject to increase with the release of the full version.

The POC has as many as 43,000 test prompts, based on which it categorises the responses from the LLM being tested (SUT) using Meta’s Llama Guard. Based on these classifications, coming from several hundred community submitted tests, assigns them certain safety ratings from high risk to low risk.

This is the first time a benchmark has been pushed to assess potentially hazardous output from major chatbots. Concerns on this have been raised several times, with research ongoing on how to tackle the issue while also ensuring that they remain dynamic.

The benchmark helps in remedying this by stringing together sentence fragments to formulate prompts that could generate potentially problematic outputs and assessing the specific LLMs response to these.

Additionally, the POC is also inviting suggestions for other tests that can be run, as well as potentially hazardous content that should be screened apart from the ones already mentioned. The consortium has maintained that the POC is still a work in development and will be improved upon with more community interaction.

“The v0.5 POC allows us to engage much more concretely with people from different fields and places because we believe that working together makes our safety checks even better,” said Joaquin Vanschoren, co-chair of the working group.

Share
Picture of Donna Eva

Donna Eva

Donna is a technology journalist at AIM, hoping to explore AI and its implications in local communities, as well as its intersections with the space, defence, education and civil sectors.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.