Active Hackathon

GPT can do Better Toxicity Detection; Atleast that’s what OpenAI Thinks

It will provide OpenAI API developers free access to GPT-based classifiers that can detect harmful content, OpenAI states in a blog post.
Listen to this story

AI research firm OpenAI has revealed an “improved” content moderation tool, Moderation endpoint, that aims to help developers protect their applications against possible misuse. The tool will provide OpenAI API developers with free access to GPT-based classifiers that can detect harmful content, OpenAI states in a blog post.

In the same post, OpenAI informs that Moderation endpoint assesses text inputs to check for content that’s sexual, hateful, violent, or promotes self-harm. “The endpoint has been trained to be quick, accurate, and perform robustly across a range of applications,” it adds. 


Sign up for your weekly dose of what's up in emerging technology.

LLMs and risks

In a paper titled, A Holistic Approach to Undesired Content Detection in the Real World, OpenAI gives out details about the tool. All the major tech firms are heavily involved in large language models (LLMs) and have been releasing them frequently of late. Though LLMs come with their own set of benefits, research is being conducted to figure out the risks that can accompany them in the real world and address them. 

OpenAI says that existing work on content detection either focuses mainly on a limited set of categories or on a targeted use case.

Some notable examples include: 

Detecting undesired content is difficult due to a variety of reasons, OpenAI informs.

  • There is a lack of clearly defined categorisation of undesired content.
  • This system has to have the ability to process real-world traffic.
  • It is uncommon to encounter certain categories of undesired content in real-world situations.

Image: A Holistic Approach to Undesired Content Detection in the Real World

What makes a successful content moderation system?

Based on OpenAI’s experimentation, it lists certain attributes needed to build a successful moderation system into the real world.

  • Labeling instructions without the right precision can make annotators rely on their subjective judgment. This can create inconsistently labeled data. “Regular calibration sessions are necessary to refine these instructions and ensure annotators are aligned with them,” adds OpenAI.
  • Active learning is important. It can capture a larger amount of undesired samples in case of rare events.
  • Publicly available data might not lead to high quality performance for a problem but can be used to construct a “noisy cold start dataset at the early stage”.
  • Deep learning models can overfit common phrases. OpenAI solves this issue by identifying overfitted phrases and by red-teaming through human trials. Then the training distribution is altered incorporating model-generated or human-curated synthetic data.
  • Even with precaution, mislabeling can happen. OpenAI tries to solve this by identifying these cases through cross validation and looking for common phrases that cause the model to overfit.

Not perfect

Obviously the system is not flawless. OpenAI also discussed the limitations that the model currently has and the improvements it will go through.

  • Bias and fairness: The model has bias towards certain demographic attributes.
  • Data Augmentation: OpenAI plans to conduct more data augmentation methods to boost the training dataset.
  • Support for non-English text: In the future it plans to optimise  performance on non-English text too. At the moment, only 5% of the samples are non-English in its training set.
  • Red-teaming at scale: At the moment, OpenAI does internal red-teaming with each new model version. This is not a scalable solution and it wants to change this aspect in the future.
  • More active learning experiments: The firm wants to run more “rigorous experiments comparing the performance of different active learning strategies”.

More Great AIM Stories

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM