GPT can do Better Toxicity Detection; Atleast that’s what OpenAI Thinks

It will provide OpenAI API developers free access to GPT-based classifiers that can detect harmful content, OpenAI states in a blog post.
Listen to this story

AI research firm OpenAI has revealed an “improved” content moderation tool, Moderation endpoint, that aims to help developers protect their applications against possible misuse. The tool will provide OpenAI API developers with free access to GPT-based classifiers that can detect harmful content, OpenAI states in a blog post.

In the same post, OpenAI informs that Moderation endpoint assesses text inputs to check for content that’s sexual, hateful, violent, or promotes self-harm. “The endpoint has been trained to be quick, accurate, and perform robustly across a range of applications,” it adds. 

LLMs and risks

In a paper titled, A Holistic Approach to Undesired Content Detection in the Real World, OpenAI gives out details about the tool. All the major tech firms are heavily involved in large language models (LLMs) and have been releasing them frequently of late. Though LLMs come with their own set of benefits, research is being conducted to figure out the risks that can accompany them in the real world and address them. 

OpenAI says that existing work on content detection either focuses mainly on a limited set of categories or on a targeted use case.

Some notable examples include: 

Detecting undesired content is difficult due to a variety of reasons, OpenAI informs.

  • There is a lack of clearly defined categorisation of undesired content.
  • This system has to have the ability to process real-world traffic.
  • It is uncommon to encounter certain categories of undesired content in real-world situations.

Image: A Holistic Approach to Undesired Content Detection in the Real World

What makes a successful content moderation system?

Based on OpenAI’s experimentation, it lists certain attributes needed to build a successful moderation system into the real world.

  • Labeling instructions without the right precision can make annotators rely on their subjective judgment. This can create inconsistently labeled data. “Regular calibration sessions are necessary to refine these instructions and ensure annotators are aligned with them,” adds OpenAI.
  • Active learning is important. It can capture a larger amount of undesired samples in case of rare events.
  • Publicly available data might not lead to high quality performance for a problem but can be used to construct a “noisy cold start dataset at the early stage”.
  • Deep learning models can overfit common phrases. OpenAI solves this issue by identifying overfitted phrases and by red-teaming through human trials. Then the training distribution is altered incorporating model-generated or human-curated synthetic data.
  • Even with precaution, mislabeling can happen. OpenAI tries to solve this by identifying these cases through cross validation and looking for common phrases that cause the model to overfit.

Not perfect

Obviously the system is not flawless. OpenAI also discussed the limitations that the model currently has and the improvements it will go through.

  • Bias and fairness: The model has bias towards certain demographic attributes.
  • Data Augmentation: OpenAI plans to conduct more data augmentation methods to boost the training dataset.
  • Support for non-English text: In the future it plans to optimise  performance on non-English text too. At the moment, only 5% of the samples are non-English in its training set.
  • Red-teaming at scale: At the moment, OpenAI does internal red-teaming with each new model version. This is not a scalable solution and it wants to change this aspect in the future.
  • More active learning experiments: The firm wants to run more “rigorous experiments comparing the performance of different active learning strategies”.

Download our Mobile App

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week.