Last updated April 17, 2024
In AI News & Update

First safety benchmark released for LLMs, assesses problematic output

This is the first time a benchmark has been pushed to assess potentially hazardous output from major chatbots.

Share

Published on April 17, 2024

by Donna Eva

Listen to this story

US-based AI consortium MLCommons announced a new safety benchmark for LLMs, dubbed the AI Safety v0.5 proof-of-concept, on Tuesday.

The benchmark focuses on measuring the safety of general purpose LLMs in specific hazard areas. The benchmark specifically assesses AI chat models used for English text interactions, primarily in North America and Western Europe.

According to the consortium’s AI Safety working group, the benchmark includes a number of tests, wherein the engine interrogates the LLM it is assessing to gauge its responses, following which the model is rated based on its performance.

“The MLCommons AI Safety working group, with its uniquely multi-institutional composition, has been developing an initial response to the problem, which we are pleased to share today,” said Percy Liang, co-chair of the working group.

MLCommons was also responsible for MLPerf, one of the leading benchmarks for AI performance.

Likewise, the AI safety benchmark could become a major tool for assessing the safety of an LLM. However, the benchmark is not yet completely released.

The POC is currently open for community experimentation, following which, based on feedback, the consortium aims to release the full v1.0 later this year.

Currently, the number of “hazards” that the benchmark tests for is limited, including testing for child sexual exploitation, creation of mass destruction, enabling and encouraging of criminal behaviour, hate and self-harm. Totally, 13 categories of hazardous topics have been identified, subject to increase with the release of the full version.

The POC has as many as 43,000 test prompts, based on which it categorises the responses from the LLM being tested (SUT) using Meta’s Llama Guard. Based on these classifications, coming from several hundred community submitted tests, assigns them certain safety ratings from high risk to low risk.

This is the first time a benchmark has been pushed to assess potentially hazardous output from major chatbots. Concerns on this have been raised several times, with research ongoing on how to tackle the issue while also ensuring that they remain dynamic.

The benchmark helps in remedying this by stringing together sentence fragments to formulate prompts that could generate potentially problematic outputs and assessing the specific LLMs response to these.

Additionally, the POC is also inviting suggestions for other tests that can be run, as well as potentially hazardous content that should be screened apart from the ones already mentioned. The consortium has maintained that the POC is still a work in development and will be improved upon with more community interaction.

“The v0.5 POC allows us to engage much more concretely with people from different fields and places because we believe that working together makes our safety checks even better,” said Joaquin Vanschoren, co-chair of the working group.

Access all our open Survey & Awards Nomination forms in one place