Listen to this story
AI research firm OpenAI has revealed an “improved” content moderation tool, Moderation endpoint, that aims to help developers protect their applications against possible misuse. The tool will provide OpenAI API developers with free access to GPT-based classifiers that can detect harmful content, OpenAI states in a blog post.
In the same post, OpenAI informs that Moderation endpoint assesses text inputs to check for content that’s sexual, hateful, violent, or promotes self-harm. “The endpoint has been trained to be quick, accurate, and perform robustly across a range of applications,” it adds.
Sign up for your weekly dose of what's up in emerging technology.
LLMs and risks
In a paper titled, A Holistic Approach to Undesired Content Detection in the Real World, OpenAI gives out details about the tool. All the major tech firms are heavily involved in large language models (LLMs) and have been releasing them frequently of late. Though LLMs come with their own set of benefits, research is being conducted to figure out the risks that can accompany them in the real world and address them.
OpenAI says that existing work on content detection either focuses mainly on a limited set of categories or on a targeted use case.
Some notable examples include:
- Toxicity: Toxicity Detection: Does Context Really Matter? And RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
- Hate speech: Locate the Hate: Detecting Tweets against Blacks
- Abusive content: Challenges and frontiers in abusive content detection
Detecting undesired content is difficult due to a variety of reasons, OpenAI informs.
- There is a lack of clearly defined categorisation of undesired content.
- This system has to have the ability to process real-world traffic.
- It is uncommon to encounter certain categories of undesired content in real-world situations.
What makes a successful content moderation system?
Based on OpenAI’s experimentation, it lists certain attributes needed to build a successful moderation system into the real world.
- Labeling instructions without the right precision can make annotators rely on their subjective judgment. This can create inconsistently labeled data. “Regular calibration sessions are necessary to refine these instructions and ensure annotators are aligned with them,” adds OpenAI.
- Active learning is important. It can capture a larger amount of undesired samples in case of rare events.
- Publicly available data might not lead to high quality performance for a problem but can be used to construct a “noisy cold start dataset at the early stage”.
- Deep learning models can overfit common phrases. OpenAI solves this issue by identifying overfitted phrases and by red-teaming through human trials. Then the training distribution is altered incorporating model-generated or human-curated synthetic data.
- Even with precaution, mislabeling can happen. OpenAI tries to solve this by identifying these cases through cross validation and looking for common phrases that cause the model to overfit.
Obviously the system is not flawless. OpenAI also discussed the limitations that the model currently has and the improvements it will go through.
- Bias and fairness: The model has bias towards certain demographic attributes.
- Data Augmentation: OpenAI plans to conduct more data augmentation methods to boost the training dataset.
- Support for non-English text: In the future it plans to optimise performance on non-English text too. At the moment, only 5% of the samples are non-English in its training set.
- Red-teaming at scale: At the moment, OpenAI does internal red-teaming with each new model version. This is not a scalable solution and it wants to change this aspect in the future.
- More active learning experiments: The firm wants to run more “rigorous experiments comparing the performance of different active learning strategies”.