Watermarking: A Band-Aid Solution for LLMs

Researchers from the University of Maryland have suggested 'watermarking' to fix AI-plagiarism.

LLMs are on fire but distinguishing between a human interlocutor and a machine still remains an unconquered quest. Technology mimicking humans has been at the centre of public debate since the Turing Test which established “indistinguishability” as the benchmark for positive AI performance. However, today, researchers from the University of Maryland suggest ‘watermarking’ as a modern benchmark for assessing AI performance. 

Though in the infancy phase, developers might use watermarking techniques to make data created by generative models detectable. Texts generated by AI are evidently fascinating. In fact, it remains such a novel topic that there are many unresolved issues—both legal and ethical—along with aspects that ask for a high degree of responsibility and knowledge from the author.

The community is desperately trying to find ways to differentiate between human- and AI-written text against the tide of potential technological exploitation, said Irene Solaiman, policy director at Hugging Face, who previously worked as an AI researcher at OpenAI and studied AI output detection for the release of GPT-2. 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

OpenAI’s Watermarking Efforts

On the subject of how watermarking works, Scott Aaronson, Professor of Computer Science at the University of Texas, wrote that his main project had been a tool for statistically watermarking the outputs of a text model like GPT. Whenever GPT generates some long text, an unnoticeable secret signal in its choices of words is required, which can then be used to prove that the content originated from GPT.

OpenAI has been working on the safeguarding of their billion-dollar IP, ChatGPT but hasn’t revealed much about its security features. In June 2022, to work on AI Safety and Alignment, OpenAI hired Scott Aaronson, an influential computer scientist. AI Safety entails a careful study of how AI might pose harm to humans and creates ways to prevent negative disruption. AI Alignment is concerned with ensuring that the AI is aligned with the intended goals.


Download our Mobile App



There’s more 

The computer scientists—led by Cabanac, Labbé, and Magazinov—who examined weird English phrases in research texts conceptualised them as “tortured phrases”. They explained these texts as, “unexpected weird phrases in lieu of established ones, such as counterfeit consciousness instead of artificial intelligence.” To further facilitate their search, they developed Problematic Paper Screener,2 which is a software package to track down academic papers that contain tortured phrases

As the pressure to publish mounts, such translation plagiarism is likely to be used even more widely. By January 2022, the scientists had successfully found nearly 3,200 papers containing such phrases, including works appearing in reputable and peer-reviewed journals.

Other notable efforts to identify AI-generated content came from the academic community as Edward Tian, a Princeton student, alongside Sreejan Kumar, developed GPTZero. This software highlights when a piece of text is produced by generative AI engines, including ChatGPT. 

Tian tested GPTZero on various posts published by companies on Linkedin and Twitter to check the effectiveness of GPTZero. Although the app does not have enough data to measure accuracy yet as it is still a work in progress, it is a good start.

No silver bullet solution

Before the San Francisco-based AI titan, OpenAI, several other companies have put forth better LLMs, while the rest catch up soon. So, it is plausible that in the near future, models without watermarks will be released and people who want output without a watermark will use such models. Eventually, if all of them implement a watermark, combining outputs of different models would be trivial to circumvent it. Remember that slight rewrites by a human would break most watermarks the same way they break the current GPT detectors.

“Just because mitigation is possible does not mean it should be implemented. Those in a place to implement—be it those at technology companies, in government or researchers—should assess the desirability,” said Dr Josh A. Goldstein, research fellow with the CyberAI Project at CSET.

One common way researchers have tried to detect AI-generated text is to use software to analyse different text features. For example, the fluency, frequency at which certain words appear, or if there is punctuation along with sentence length patterns.

Certain questions that need to be asked include: Is mitigation technically and socially feasible? What is the downside risk? “We need more research, analysis and testing to better address which mitigations are desirable and to highlight mitigations we overlooked,” Goldstein said. “We don’t have a silver bullet solution.”

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Tasmia Ansari
Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.

Our Upcoming Events

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023

21 Jul, 2023 | New York
MachineCon USA 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR