OpenAI, a Data Scavenging Company for Microsoft

While it might be true that the investment was for furthering AI research, this partnership is also providing Microsoft with one of the greatest assets of this digital age, data, and—perhaps to make it worse—that data might be yours.

Share

Published on March 24, 2023

by Mohit Pandey

Listen to this story

If you thought OpenAI exploiting Kenyan workers with a wage of less than $2 per hour was morally wrong, think about yourself for a moment. You, who has equally contributed for the improvement of ChatGPT but for free, in the name of reinforcement learning with human feedback (RLHF).

Now, where is all of this data going and who is using it?

As soon as we talk about ‘OpenAI’ or ‘ChatGPT’, the conversation is bound to head towards Microsoft and how the tech-giant has invested billions of dollars into the company. While it might be true that the investment was for furthering AI research, this partnership is also providing Microsoft with one of the greatest assets of this digital age, data, and—perhaps to make it worse—that data might be yours. Now with internet access enabled, OpenAI has removed the users from the picture as well.

Recently, OpenAI also integrated the capability of browsing the web for answers within ChatGPT through several new plugins, something that was previously touted as the biggest limitation of the AI model. This will most likely enable real-time accuracy and up-to-date responses from the chatbot, not excusing the 2021 cut-off data anymore. But, as much as this would increase the usability and emergence of new cool things, there is a different, probably darker side to this.

Up until now, ChatGPT was limited to its training data. But now, the capability to move beyond its training data enables this AI chatbot to retrieve more data, thus making the LLM-based chatbot even larger. But why is this an issue?

Sam Altman, CEO of OpenAI, recently expressed concerns about the capabilities of these generative models and also suggested that the progress of this field must be slower. But now, by connecting it to the internet, the company’s motivation behind developing this technology indicates something else.

Setting aside the security and privacy issues of what these plugins on ChatGPT might entail, internet connectivity will now provide more data to OpenAI, which eventually would be provided to—guess who? Microsoft, which is not just the fund provider but is also the cloud provider of the “non-profit” research organisation. All the data that OpenAI gathers is stored on Microsoft’s cloud. OpenAI’s Privacy policy does not deny the fact that it shares personal information of the users’ with its vendors and service providers, which clearly is Microsoft.

How would Microsoft benefit from this? Well, for starters, they would have access to all the data on the internet that can be used in building their own products. While OpenAI might be busy building AI models and collecting data, Microsoft is already heading its own way and utilising the collected data. If anyone wants to access that data, Microsoft can create its own walls and restrict everyone else. Pretty aggressive.

Users definitely have to be very careful before sharing their personal information on the free-to-access ChatGPT and GPT-4, which are now also available on almost all of Microsoft’s products.

On another note, Microsoft, OpenAI, and GitHub were recently hit with a class-action lawsuit for using developers’ code in building Copilot. This eventually led to OpenAI discontinuing support for Codex, the technology that was powering Copilot and integrating GPT-4 into Azure OpenAI Service. In all likelihood, there might be another lawsuit waiting for these companies when users realise that GPT-4 is also trained using similar, if not the same, data.

Satya Nadella, CEO of Microsoft, has already spoken about the future of generative AI and what the partnership means for both the companies. “In future, the generative models will generate most of the data. But, on the other side, we should also think about how it can augment us in what we are doing today since it can have a huge impact on our future.”

As part of their investment, Microsoft gained exclusive access to the entire OpenAI codebase
— Elon Musk (@elonmusk) March 24, 2023

The AI race is fuelled by data for the most part and we can clearly see that Microsoft has recognised that. There is a high probability that Microsoft regards OpenAI merely as a ‘data scavenging company’ more than an AI company.

Why Does It Matter?

Interestingly, Microsoft has laid off its entire Responsible AI team. Data privacy, storage, or usage are probably just fluff talk for the company anyway.

When OpenAI announced ChatGPT Plus, they also said that they would not be storing users’ data anymore for training the model but, for that to happen, the users have to opt out. In addition, the data would be deleted only after a month. Notably, with internet access now, the company does not even need to store data for training the model.

Microsoft would, in turn, be able to take this data from all the websites across the internet, repack it as theirs, and present it for the users. This means that people would essentially stop going to the internet or visiting websites for accessing information, and only rely on ChatGPT or GPT-4 in the future. This would bring online traffic to an eventual halt and thus affect the revenue of these sources, inevitably leading them to shut down.

Read: OpenAI and Microsoft: A Match Made in Tech Heaven

On the other hand, Google is no less. This big tech has been monopolising the search engine market since its inception. No matter how much Microsoft tries to push Bing by integrating GPT into search, Google stays a step ahead.

Recently, Google released ‘Bard’ to the public. Though it is reportedly trained on LaMDA, the company’s own LLM, when asked about the training data to the chatbot itself, it reveals that it includes Google Search and Gmail, among other apps. Though the company has tried to refute the claim by saying that the chatbot might be hallucinating, the bug that they are actually doing this has already been planted on many people’s minds.

This is good for Microsoft in its bid to get ahead of Google, to which Jeff Bezos says “Treat it like a mountain — you can climb it, but not move it.” It’s hard to deny that OpenAI’s datasets are a valuable asset. ChatGPT’s massive success has given the company its biggest asset. With Microsoft’s deep pockets and extensive reach, it’s up to them to decide what to do with the data. So, while the partnership between Microsoft and OpenAI may not be as good as some had hoped for data privacy, it’s still a win-win situation for both the companies.

Access all our open Survey & Awards Nomination forms in one place