Data is Gold, Twitter the Goldmine to Train AI Models

Every tweet posted on the platform, becomes the property of the social media giant and can be used by others who have access to its API
Listen to this story

Meta-owned Instagram has ambitious plans in the pipeline to enter the microblogging space and challenge Elon Musk’s Twitter app. According to a Bloomberg report, Instagram is currently developing a Twitter-like microblogging application set to make its debut before the end of June. Internally referred to as the P92 or Barcelona, this new platform aims to combine the best features of Instagram and Twitter.

The development of Instagram comes at a time when Jack Dorsey is already working on his decentralised social network Bluesky, which is kind of similar to Twitter, but open source. A similar example of the platform can be Mastodon, another decentralised Twitter-based social network. 

As per recent reports, Instagram’s forthcoming app will serve as a text-based platform for engaging in conversations. Users will have the ability to communicate directly with their audience and peers.

The app will offer various creative tools for users to craft their messages, including the option to incorporate links, photos, and videos. 

Legal Disputes

Instagram’s foray into creating an alternative to Twitter coincides with an ongoing legal dispute between Twitter and Microsoft. Twitter has filed a lawsuit against Microsoft, accusing the company of unauthorised utilisation of Twitter data for training purposes, which it deems as ‘illegal’.

The conflict emerged when Microsoft refused to pay for access to Twitter’s API, which had recently introduced new payment tiers. Previously, developers could freely utilise the Twitter API, but to optimise their earnings, Twitter’s CEO, Musk, announced the end of this cost-free accessibility.

The confluence of events surrounding the lawsuit, Twitter’s decision to monetize its API, and Meta’s introduction of a Twitter-like application hint at a larger context. Twitter has traditionally been a unique platform, fostering text-heavy content and enabling individuals to express their opinions freely, leading to a more authentic and human experience compared to other platforms where insincerity is prevalent.

This distinctive nature of Twitter provides invaluable data for researchers aiming to enhance the human-like responses of language models like GPT. While the specific motives behind Elon Musk’s lawsuit against Microsoft remain unknown, the possibility of Microsoft utilising Twitter data to train OpenAI’s GPT models cannot be dismissed.

In the past as well, Microsoft had ventured into training bots using Twitter’s data. One notable example is Tay, a Twitter bot introduced in 2016, positioned as an experiment in “conversational understanding” by the company.

Microsoft stated that Tay would become smarter the more users interacted with it, adapting to engaging people through casual and playful conversation. Unfortunately, this endeavour turned sour for the software giant.

Users began inundating the bot with misogynistic, racist, and Donald Trump-inspired remarks. As a result, Tay, essentially an internet-connected robot parrot, started echoing these sentiments back to users.

Training Data in Conflict with Users

Twitter’s privacy policy, which most users tend to ignore, clearly states that by publicly posting content, the users are directing the platform to disclose that information as broadly as possible, including through its APIs, and directing those accessing the information through its APIs to do the same.

This essentially means that every tweet posted on the platform becomes the property of the social media giant and can be used by others who have access to its API. 

However, since Twitter has put its API behind the paywall, many companies are coming up with their own Twitter-like platforms, claiming to provide users with a richer experience. Users may or may not have wanted that in the first place.

They use social media platforms to express their own opinions, rather than providing a cache of datasets for AI companies to make LLM out of it. 

European Union Draft AI Bill

While it is currently permissible for platforms to utilise users’ opinions and posts as data for their models as long as they include it in their privacy policy, it is crucial to ensure that users are sufficiently informed about this. 

In the past, websites commonly employed cookies to enhance the browsing experience for visitors, yet they did not typically present consent pop-ups as we often encounter nowadays. Similarly, legislation such as the EU draft AI Bill advocates for companies to disclose the datasets on which their models are trained, which can prove invaluable insights in the aforementioned scenarios. 

To address this matter effectively, it would be beneficial to introduce a notification, such as a pop-up message, on social media platforms explicitly informing users that their opinions are being utilised to train AI models.

Download our Mobile App

Lokesh Choudhary
Tech-savvy storyteller with a knack for uncovering AI's hidden gems and dodging its potential pitfalls. 'Navigating the world of tech', one story at a time. You can reach me at:

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.