MITB Banner

Did OpenAI Purposely Discontinue its AI Classifier?

Research shows that the current state-of-the-art detectors are not reliable in practical scenarios

Share

Listen to this story

On any given day, OpenAI’s ChatGPT had the indiscretion of claiming that anything and everything was written by it. But, now OpenAI seems to have put a lid on this. 

Last week, it discontinued its AI classifier, a tool designed to determine the likelihood that a given text passage was written by another AI language model. They had launched it in January this year and discontinued it quietly on July 20. However, the timing to shut down the classifier is curious because OpenAI, along with other companies, made a voluntary commitment to developing AI ethically and transparently under the guidance of the White House. 

One aspect of this commitment is that the development of robust watermarking and detection methods to address the issue of AI-generated junk that’s filing the web faster than a mole. However, despite the companies’ promises, there have been no reliable watermarking or detection methods to date. For example, Google had announced that it is experimenting to include metadata with the image generated by its AI models to watermark them, but they haven’t put out any model for text. 

Why OpenAI Discontinued its AI Classifier? 

The decision to retire the tool was influenced by widespread criticism of its “low rate of accuracy.” While many users relied on the classifier to catch instances of low-effort cheating, it failed to deliver satisfactory results.

However, many have pointed out the irony to have a dedication to both identifying AI content and striving to create AI content that closely resembles human behaviour simultaneously. And it seems like OpenAI has shed the veil and is completely focused on the latter now.

It is argued that the detection of AI-generated content would not be effective and, to be frank, should not even be pursued due to its seemingly futile nature. This is particularly true in the context of AI-generated image creation, where watermarks can be effortlessly removed, posing a challenge for detection methods. While others humorously suggested that the ultimate goal of AI detection could be achieving world domination, akin to passing the Turing test flawlessly. 

The idea that AI-generated text might have identifying features or patterns that could be reliably detected appeared intuitive when OpenAI released its classifier. However, in practice, this has proven challenging due to the rapid development of large language models. The differences between various language models have made it difficult to rely on specific identifying features.

Recent advancements in natural language processing have enabled large language models and made them capable of generating human-like texts for various tasks. However, this progress also presents challenges, as these LLMs can be misused for plagiarism, spamming, and social engineering to manipulate public opinion. To address this, there is a demand for efficient LLM text detectors to mitigate the misuse of publicly available LLMs.

Can’t be Detected

Various AI text detectors using watermarking, zero-shot methods, retrieval-based methods, and trained neural network-based classifiers have been proposed. However, a research paper shows, both theoretically and empirically, that current state-of-the-art detectors are not reliable in practical scenarios. Paraphrasing the LLM outputs effectively evades these detectors, allowing attackers to generate and spread misinformation undetected. Even the best detectors can only marginally outperform a random classifier against sufficiently advanced language models.

The paper also demonstrates that watermarking and retrieval-based detectors can be spoofed to identify human-composed text as AI-generated, potentially harming the reputation of LLM detector developers. With the release of more advanced LLMs like GPT-4, the need for more secure methods to prevent misuse becomes crucial.

A test conducted on OpenAI’s classifier correctly identified only one out of seven generated text snippets tested, and that was with a language model that was not even cutting-edge at the time of the test.

Despite the limitations and disclaimers provided by OpenAI with the classifier tool, some users took its claims of detection at face value. This led to the misuse of the tool, as people would test suspected AI-generated content without realising its unreliability.

There are potential vulnerabilities attackers might exploit in the future, such as improved paraphrasing models or smart prompting attacks. Current detectors should reliably flag AI-generated texts while avoiding excessive false positives to prevent wrongful accusations of plagiarism and protect the reputation of LLM developers.

A recent follow-up work by Souradip Chakraborty argued that AI-text detection is almost always possible, even with low total variation between human and AI-generated distributions, but this may not hold in real-world applications due to correlations in human-written text. Other works suggest that existing LLM outputs are very different from human-written text, but the authors maintain that as language models advance, adversaries’ ability to evade detection will likely improve.

In addition to reliability issues, the paper mentions the potential bias of detectors against non-native English writers. Having low average type I and II errors may not be sufficient for practical deployment if the detector performs poorly within specific sub-populations.

While it remains a challenging task, progress in this area is essential for ensuring the responsible and trustworthy use of AI-generated text. Additionally, the first of its kind truly reliable watermarking or detection tool would be, as such a tool would be invaluable in various contexts.

Share
Picture of Shyam Nandan Upadhyay

Shyam Nandan Upadhyay

Shyam is a tech journalist with expertise in policy and politics, and exhibits a fervent interest in scrutinising the convergence of AI and analytics in society. In his leisure time, he indulges in anime binges and mountain hikes.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.