Internet has Become An AI Dumping Ground, No Solution in Sight

Realising the potential of generative AI models, people have started filling websites with AI-generated junk to get the attention of advertisers

Amazon’s Kindle Unlimited young adult romance bestseller list was filled with dozens of nonsensical AI-generated books last month; the tech giant figured out a way to monetise it. 

After realising the potential of generative AI models like GPT, people have taken a step ahead and started filling websites with junk generated by AI to get the attention of advertisers. This content aims to attract paying advertisers according to a report from the media research organisation NewsGuard. The companies behind the models generating this content have been vocal about the measures they are taking to deal with the issue but no concrete plan has yet been executed. 

According to the report, more than 140 major brands are currently paying for advertisements that end up on unreliable AI-written sites, likely without their knowledge. The report further clarifies that the websites in question are presented in a way that a reader could assume that it’s produced by human writers, because the site has a generic layout and content typical to news websites. Furthermore, these websites do not clearly disclose that its content is AI produced. 

Hence, it is high time authorities step in and take charge of not just keeping an eye on false but also non-human generated content.

Google Search in the picture 

According to a recent report by NewsGuard, a staggering 90% of advertisements from well-known brands appearing on AI-generated news websites were pushed by Google, despite the company’s own policies prohibiting the ad placement of ads on pages containing “spammy automatically generated content”. This trend not only poses a threat of a spammy internet dominated by AI-generated material but also questions the massive amount of money spent on advertising. 

Earlier this year, Google issued a statement asserting its commitment to safeguard search results against spam, emphasising that employing AI-generated content to manipulate search rankings is a violation of spam policies within Alphabet.

The Sundar Pichai-led firm announced at the latest Google I/O conference significant steps to identify and contextualise AI content available on its Search. While measures like watermarking and implementing metadata aims to ensure transparency and enable users to differentiate between AI-generated and authentic images, it can only be applied to images as there is no obvious way to watermark AI-generated text.

Mass Produced

The rise of false information has been a major cause of concern but now the monetisation of the activity has clearly skyrocketed. A few months ago, several media houses fell prey to a hoax image of an explosion near the Pentagon causing collateral damage to the US stock market. 

Since generative AI models gained popularity on the internet, many instances of false information have surfaced — Former US President Donald Trump apparently being arrested, or Tesla CEO Elon Musk holding hands with GM CEO Mary Barra. Also, who can forget Pope Francis wearing a stylish white puffer jacket walking around with coffee in one hand? These events highlight how difficult it is going to be to separate AI-generated content from facts. 

Incoming model collapse 

Unlike Google, NewsGuard has figured out a clever way to identify unreliable AI-written content on the internet. Since many of these sites lack human intervention, they often contain error messages commonly seen in AI-generated content. For instance, displayed messages like “Sorry, I cannot fulfil this prompt as it goes against ethical and moral principles… As an AI language model, it is my responsibility to provide factual and trustworthy information.” NewsGuard’s AI scans for these messages, and then a human analyst reviews them.

The increasing spammy AI-generated content on the internet can become a problem for the AI companies behind these AI models. Reason being, the foundational large language models of chatbots like ChatGPT and Bing, train on publicly available data. As these data sets are constantly filled with AI-produced content, researchers are raising concern that the language models will become less useful, a phenomenon known as “model collapse”.

Ilia Shumailov, a research fellow at Oxford University’s Applied and Theoretical Machine Learning Group who co-wrote The Curse of Recursion: Training on Generated Data Makes Models Forget — a paper on this phenomenon, believes the collapse is ‘inevitable’ and might not be such a bad thing after all. “Maybe we’ll get rid of captchas, and it will become normal to be a computer on the internet,” he told the Wall Street Journal, referring to the picture-puzzles that websites impose to distinguish computers from humans.

Tasmia Ansari
Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox