Last updated January 27, 2023
In AI Origins & Evolution

Watermarking: A Band-Aid Solution for LLMs

Researchers from the University of Maryland have suggested 'watermarking' to fix AI-plagiarism.

Share

Published on January 27, 2023

by Tasmia Ansari

Listen to this story

LLMs are on fire but distinguishing between a human interlocutor and a machine still remains an unconquered quest. Technology mimicking humans has been at the centre of public debate since the Turing Test which established “indistinguishability” as the benchmark for positive AI performance. However, today, researchers from the University of Maryland suggest ‘watermarking’ as a modern benchmark for assessing AI performance.

Though in the infancy phase, developers might use watermarking techniques to make data created by generative models detectable. Texts generated by AI are evidently fascinating. In fact, it remains such a novel topic that there are many unresolved issues—both legal and ethical—along with aspects that ask for a high degree of responsibility and knowledge from the author.

The community is desperately trying to find ways to differentiate between human- and AI-written text against the tide of potential technological exploitation, said Irene Solaiman, policy director at Hugging Face, who previously worked as an AI researcher at OpenAI and studied AI output detection for the release of GPT-2.

OpenAI’s Watermarking Efforts

On the subject of how watermarking works, Scott Aaronson, Professor of Computer Science at the University of Texas, wrote that his main project had been a tool for statistically watermarking the outputs of a text model like GPT. Whenever GPT generates some long text, an unnoticeable secret signal in its choices of words is required, which can then be used to prove that the content originated from GPT.

OpenAI has been working on the safeguarding of their billion-dollar IP, ChatGPT but hasn’t revealed much about its security features. In June 2022, to work on AI Safety and Alignment, OpenAI hired Scott Aaronson, an influential computer scientist. AI Safety entails a careful study of how AI might pose harm to humans and creates ways to prevent negative disruption. AI Alignment is concerned with ensuring that the AI is aligned with the intended goals.

There’s more

The computer scientists—led by Cabanac, Labbé, and Magazinov—who examined weird English phrases in research texts conceptualised them as “tortured phrases”. They explained these texts as, “unexpected weird phrases in lieu of established ones, such as counterfeit consciousness instead of artificial intelligence.” To further facilitate their search, they developed Problematic Paper Screener,2 which is a software package to track down academic papers that contain tortured phrases.

As the pressure to publish mounts, such translation plagiarism is likely to be used even more widely. By January 2022, the scientists had successfully found nearly 3,200 papers containing such phrases, including works appearing in reputable and peer-reviewed journals.

Other notable efforts to identify AI-generated content came from the academic community as Edward Tian, a Princeton student, alongside Sreejan Kumar, developed GPTZero. This software highlights when a piece of text is produced by generative AI engines, including ChatGPT.

Tian tested GPTZero on various posts published by companies on Linkedin and Twitter to check the effectiveness of GPTZero. Although the app does not have enough data to measure accuracy yet as it is still a work in progress, it is a good start.

No silver bullet solution

Before the San Francisco-based AI titan, OpenAI, several other companies have put forth better LLMs, while the rest catch up soon. So, it is plausible that in the near future, models without watermarks will be released and people who want output without a watermark will use such models. Eventually, if all of them implement a watermark, combining outputs of different models would be trivial to circumvent it. Remember that slight rewrites by a human would break most watermarks the same way they break the current GPT detectors.

“Just because mitigation is possible does not mean it should be implemented. Those in a place to implement—be it those at technology companies, in government or researchers—should assess the desirability,” said Dr Josh A. Goldstein, research fellow with the CyberAI Project at CSET.

One common way researchers have tried to detect AI-generated text is to use software to analyse different text features. For example, the fluency, frequency at which certain words appear, or if there is punctuation along with sentence length patterns.

Certain questions that need to be asked include: Is mitigation technically and socially feasible? What is the downside risk? “We need more research, analysis and testing to better address which mitigations are desirable and to highlight mitigations we overlooked,” Goldstein said. “We don’t have a silver bullet solution.”

Access all our open Survey & Awards Nomination forms in one place