MITB Banner

Data is Gold, Twitter the Goldmine to Train AI Models

Every tweet posted on the platform, becomes the property of the social media giant and can be used by others who have access to its API
Share
Listen to this story

Meta-owned Instagram has ambitious plans in the pipeline to enter the microblogging space and challenge Elon Musk’s Twitter app. According to a Bloomberg report, Instagram is currently developing a Twitter-like microblogging application set to make its debut before the end of June. Internally referred to as the P92 or Barcelona, this new platform aims to combine the best features of Instagram and Twitter.

The development of Instagram comes at a time when Jack Dorsey is already working on his decentralised social network Bluesky, which is kind of similar to Twitter, but open source. A similar example of the platform can be Mastodon, another decentralised Twitter-based social network. 

As per recent reports, Instagram’s forthcoming app will serve as a text-based platform for engaging in conversations. Users will have the ability to communicate directly with their audience and peers.

The app will offer various creative tools for users to craft their messages, including the option to incorporate links, photos, and videos. 

Legal Disputes

Instagram’s foray into creating an alternative to Twitter coincides with an ongoing legal dispute between Twitter and Microsoft. Twitter has filed a lawsuit against Microsoft, accusing the company of unauthorised utilisation of Twitter data for training purposes, which it deems as ‘illegal’.

The conflict emerged when Microsoft refused to pay for access to Twitter’s API, which had recently introduced new payment tiers. Previously, developers could freely utilise the Twitter API, but to optimise their earnings, Twitter’s CEO, Musk, announced the end of this cost-free accessibility.

The confluence of events surrounding the lawsuit, Twitter’s decision to monetize its API, and Meta’s introduction of a Twitter-like application hint at a larger context. Twitter has traditionally been a unique platform, fostering text-heavy content and enabling individuals to express their opinions freely, leading to a more authentic and human experience compared to other platforms where insincerity is prevalent.

This distinctive nature of Twitter provides invaluable data for researchers aiming to enhance the human-like responses of language models like GPT. While the specific motives behind Elon Musk’s lawsuit against Microsoft remain unknown, the possibility of Microsoft utilising Twitter data to train OpenAI’s GPT models cannot be dismissed.

In the past as well, Microsoft had ventured into training bots using Twitter’s data. One notable example is Tay, a Twitter bot introduced in 2016, positioned as an experiment in “conversational understanding” by the company.

Microsoft stated that Tay would become smarter the more users interacted with it, adapting to engaging people through casual and playful conversation. Unfortunately, this endeavour turned sour for the software giant.

Users began inundating the bot with misogynistic, racist, and Donald Trump-inspired remarks. As a result, Tay, essentially an internet-connected robot parrot, started echoing these sentiments back to users.

Training Data in Conflict with Users

Twitter’s privacy policy, which most users tend to ignore, clearly states that by publicly posting content, the users are directing the platform to disclose that information as broadly as possible, including through its APIs, and directing those accessing the information through its APIs to do the same.

This essentially means that every tweet posted on the platform becomes the property of the social media giant and can be used by others who have access to its API. 

However, since Twitter has put its API behind the paywall, many companies are coming up with their own Twitter-like platforms, claiming to provide users with a richer experience. Users may or may not have wanted that in the first place.

They use social media platforms to express their own opinions, rather than providing a cache of datasets for AI companies to make LLM out of it. 

European Union Draft AI Bill

While it is currently permissible for platforms to utilise users’ opinions and posts as data for their models as long as they include it in their privacy policy, it is crucial to ensure that users are sufficiently informed about this. 

In the past, websites commonly employed cookies to enhance the browsing experience for visitors, yet they did not typically present consent pop-ups as we often encounter nowadays. Similarly, legislation such as the EU draft AI Bill advocates for companies to disclose the datasets on which their models are trained, which can prove invaluable insights in the aforementioned scenarios. 

To address this matter effectively, it would be beneficial to introduce a notification, such as a pop-up message, on social media platforms explicitly informing users that their opinions are being utilised to train AI models.

PS: The story was written using a keyboard.
Share
Picture of Lokesh Choudhary

Lokesh Choudhary

Tech-savvy storyteller with a knack for uncovering AI's hidden gems and dodging its potential pitfalls. 'Navigating the world of tech', one story at a time. You can reach me at: lokesh.choudhary@analyticsindiamag.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India