Listen to this story
Do large language models (LLMs) based chatbots need to improve? For sure. How exactly would that come about is what researchers have been pursuing all this while. Now, Mark Cuban, the American billionaire and AI enthusiast, has come up with a solution. The idea is to train models with access to significant intellectual property and data to become large knowledge models.
The only problem with this, as Cuban explained, is that this data would not be free. This would bring up another race in the AI world – which big tech company, Google, Microsoft, or Meta, would be the first to pay for the data and how much would that be?
This is important because companies have been using GPT-based models and APIs to generate content, but for a lot of companies the data that these chatbots, like ChatGPT are trained on, is irrelevant. That is why OpenAI introduced plugins for ChatGPT so that enterprises can let the chatbot access their proprietary data, and respond to queries based on that.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
This also comes with a lot of privacy issues. A lot of companies are hesitant to work with GPT or Bard and upload their private information, which would then be accessible to the big techs.
What Would Win
Steve Jarret, head of AI and data at Orange, described LLMs as an OS platform. What this means is that these chatbots in their basic structure are built to only provide simple capabilities, how Android or iOS lets you operate your phone. But the plugins that OpenAI introduced are like the apps we install, designed for specific use cases, and improving the capabilities of the phone.
This means that these big-tech companies that are building LLMs and providing them to every organisation possible, need to integrate and embed more features into their core offerings like ChatGPT or Bard. These features should include information letting the LLMs link to the proprietary data of the company. This would help the models provide more knowledge-driven information, which is what companies want in the end.
If OpenAI or Google does not embed capabilities such as the input of proprietary data, even without plugins, people building these plugins would start selling them directly to enterprises. This would make the LLMs independently irrelevant.
For example, a company that builds a plugin to access financial data from a website would start selling the plugin to be linked to any LLM. This is why the companies need to buy exclusive access to intellectual property data, through which they would be able to train knowledgeable chatbots that companies can use as it is.
Moreover, this would also allow companies to use specific use case generative models that are combined with LLMs to generate better-focused responses. If specific use case-based chatbots are built, they would be ideal for individual companies. These models, when fed with proprietary data, would be able to generate responses as intended by the users in the way current LLMs do.
Who Would Win
Elon Musk is in a bid to build his own rival to ChatGPT. Though there hasn’t been much information lately about Musk’s intentions with generative AI, he has roped in many researchers to build something related to the trending technology. Interestingly, Musk has something that no other company has – Twitter data – that is actually a gold mine.
OpenAI had access to this data before Musk realised this upon becoming the CEO of the company. He pulled the data away from OpenAI and plans to file a lawsuit against the company for using the data for building ChatGPT. Now he is building an all-encompassing shell firm called X Corp.
There is a high possibility that Musk’s “based” bot would be the one that would be truly “knowledgeable”. Cuban predicts that Musk’s TruthGPT, would be ahead of Google, Meta, and Microsoft’s offerings and is also expected to be open source. “He can weigh his own tweets and those of the sources he likes and end up with a consumer facing AI that can be a virtual Elon. Pretty cool. Pretty scary,” said Cuban.
The scary part about building an AI model based on Twitter is the amount of fine-tuning it requires to filter out the irrelevant, and in some cases dangerous opinions of people.
On the other hand, if Musk does not build a chatbot built on his “intellectual property data”, someone else would. When Bard was released, several people prompted it about its dataset, to which it replied that it is trained on Gmail data. Google was quick to respond that Bard is just hallucinating, but this still brings in the question about privacy around the chatbot. Moreover, is Gmail data actually an intellectual property of Google.
The same is the case with Microsoft and OpenAI’s ChatGPT. Musk would be able to avoid any legal ramifications if he uses Twitter data to train his AI model, unlike OpenAI, which is being sued and blamed for training on private data.
Twitter’s privacy policies clearly state – By publicly posting content, you are directing us to disclose that [shared] information as broadly as possible, including through our APIs, and directing those accessing the information through our APIs to do the same.
This might also be the chance for either Google or OpenAI to build their moat with intellectual property data, instead of publicly available data.