Workshop-Sarita-Priyadarshini-970x90

60% of GPT-3.5 Outputs Are Plagiarised: Report

Copyleaks used a proprietary scoring method considering identical text, minor alterations, paraphrasing, and more to assign a "similarity score."

Share

A report from plagiarism detector Copyleaks has revealed that 60% of OpenAI’s GPT-3.5 outputs contain some form of plagiarism. The company used a proprietary scoring method considering identical text, minor alterations, paraphrasing, and more to assign a “similarity score.”

Copyleaks specializes in AI-based text analysis and offers plagiarism detection tools to businesses and schools. The company has been in the game well before ChatGPT. Although GPT-3.5 was the star behind ChatGPT’s debut, OpenAI has since upgraded to the more advanced GPT-4.

According to their latest findings, GPT-3.5 exhibited 45.7% identical text, 27.4% minor changes, and 46.5% paraphrased text. A score of 0% implies complete originality, while 100% suggests no original content, as per the report.

Copyleaks subjected GPT-3.5 to various tests, generating around a thousand outputs, each approximately 400 words, across 26 subjects. The results with the highest similarity score belonged to computer science (100%), followed by physics (92%) and psychology (88%). On the flip side, theatre (0.9%), humanities (2.8%), and English language (5.4%) registered the lowest similarity scores.

“Our models were designed and trained to learn concepts in order to help them solve new problems,” OpenAI spokesperson Lindsey Held told Axios. “We have measures in place to limit inadvertent memorization, and our terms of use prohibit the intentional use of our models to regurgitate content.”

Plagiarism goes beyond cutting and pasting entire sentences and paragraphs. The New York Times filed a lawsuit against OpenAI, stating that OpenAI’s AI systems’ “wide-scale copying” constitutes copyright infringement. OpenAI responded to the lawsuit, arguing that “regurgitation” is a “rare bug” and also accusing The New York Times of “manipulating prompts.”

But content creators in general, from authors to visual artists, have been trying to argue in court that the underlying technology, generative AI, is trained on their copyrighted work; hence, it ends up spitting out exact copies. But as of now, the laws have managed to work out for the companies instead of the other party. A glimpse of hope is visible with the NYT case but the matter remains pending.

Share
Picture of Tasmia Ansari

Tasmia Ansari

Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India