Last updated January 22, 2024
In AI Origins & Evolution

How Generative AI is Gobbling Up the Internet

If 2023 was the year of fearing generative AI, 2024 will be the year for some of those worries to come true.

Share

Published on January 22, 2024

by Tasmia Ansari

If 2023 was the year of fearing generative AI, 2024 will be the year for some of those worries to come true.

Last summer, Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, and Ross Anderson wrote ‘The Curse of Recursion: Training on Generated Data Makes Models Forget‘, a paper hinting at AI models poisoning themselves in the (near) future. The warning was seen as farsighted and only theoretical, but evidence of the problematic technology has emerged.

The problem is called “model collapse”, where AI chatbots will lose the information they initially learned and replace it with synthetic data of other AI models. This degenerative process is not in theory any more. Last month, a Twitter user posted a screenshot showing that Grok, the large language model chatbot developed by Elon Musk’s AI company xAI, had plagiarised a response from OpenAI.

When asked by Winterbourne to tinker with malware, Grok responded that it could not, “as it goes against OpenAI’s use case policy”.

“Grok is literally just ripping OpenAI’s code base,” the user explained in the post. The explanation was denied by a technical staff member of xAI who has previously worked for rivals OpenAI and Google DeepMind.

“This was a huge surprise to us when we first noticed it,” he responded. The staff member might not have seen this coming, but the company’s chief, Musk, definitely did.

Following the screenshot, which amassed a lot of reactions, ChatGPT shared the same and wrote, “We have a lot in common.” Musk noted the same and responded, “Well, son, since you scraped all the data from this platform for your training, you ought to know.”

The technology has given rise to competition not just among tech companies but rehashed old rivalries like the one between OpenAI and Musk, who was a cheerleader of the GPT maker earlier.

Leaving tech bros’ personal problems aside, AI-related error messages have also entered online shopping lists. Users on the e-commerce platform Amazon have pointed out that OpenAI error messages appear in products, be it lawn chairs or Chinese religious tracts.

The original copies of these products are named “I’m sorry, but I cannot fulfil this request. It goes against OpenAI use policy” have been archived after media publications discovered the listings. Still, many such artificial posts can be found on Threads and LinkedIn.

Delusions, Delusions, Delusions

Many said that the research by Shumailov and the team overlooked an essential point. One of them was Daniel Sack, managing director and partner at Boston Consulting Group X (the tech build and design unit of BCG).

“Most of the data that will be used to train future models will not be mere reproductions of the source material but entirely novel and unprecedented,” he wrote on LinkedIn.

His theory, in response, is understandable since people in tech usually have a hard time calling out the mishaps of the products they are building or helping someone else build. Time and again, Silicon Valley has hesitated to acknowledge the menace of unwanted technologies.

The curious case of generative AI models is even harder to pinpoint since a lot of money is riding on the game.

Even Sack’s firm, BCG X, has collaborated with OpenAI, revealing that none in favour of the technology can be trusted, at least for now, since it has layers of unsolved ethical issues. All the above issues show that boasting about the technology’s capabilities to solve humanity’s grave problems should not be the primary response.

No Way Back

Generative AI programs rely on unfathomable amounts of data from every nook and cranny of the internet. The web has already become awash with AI-generated spam. No matter how much the VCs or developers of these AI models deny, the problem exists and is only going to get worse from here as hundreds of millions of people are using these tools every day.

“It really shows that these models are not going to be reliable in the long run if they learn from post-LLM age data—without being able to tell what data has been machine-generated, the quality of the outputs will continue to decline,” Catherine Flick, a professor of ethics and games technology at Staffordshire University told Fast Company while speaking about the Grok incident.

Foremostly, there is no way for humans to differentiate between AI-generated and human-generated content. Similarly, these language models also have no way of telling whether the AI-generated text they see corresponds to reality, which could introduce even more misinformation than current models.

For now, all one can do is sit back and watch the internet burn.

Access all our open Survey & Awards Nomination forms in one place

Tasmia Ansari

Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.