MITB Banner

Forget LLMs, Large Knowledge Models are The Future of AI Chatbots

Chatbots need to improve and training them on intellectual property data is the next step

Share

Large Knowledge Models
Listen to this story

Do large language models (LLMs) based chatbots need to improve? For sure. How exactly would that come about is what researchers have been pursuing all this while. Now, Mark Cuban, the American billionaire and AI enthusiast, has come up with a solution. The idea is to train models with access to significant intellectual property and data to become large knowledge models.

The only problem with this, as Cuban explained, is that this data would not be free. This would bring up another race in the AI world – which big tech company, Google, Microsoft, or Meta, would be the first to pay for the data and how much would that be? 

This is important because companies have been using GPT-based models and APIs to generate content, but for a lot of companies the data that these chatbots, like ChatGPT are trained on, is irrelevant. That is why OpenAI introduced plugins for ChatGPT so that enterprises can let the chatbot access their proprietary data, and respond to queries based on that.

This also comes with a lot of privacy issues. A lot of companies are hesitant to work with GPT or Bard and upload their private information, which would then be accessible to the big techs.

What Would Win

Steve Jarret, head of AI and data at Orange, described LLMs as an OS platform. What this means is that these chatbots in their basic structure are built to only provide simple capabilities, how Android or iOS lets you operate your phone. But the plugins that OpenAI introduced are like the apps we install, designed for specific use cases, and improving the capabilities of the phone. 

This means that these big-tech companies that are building LLMs and providing them to every organisation possible, need to integrate and embed more features into their core offerings like ChatGPT or Bard. These features should include information letting the LLMs link to the proprietary data of the company. This would help the models provide more knowledge-driven information, which is what companies want in the end. 

If OpenAI or Google does not embed capabilities such as the input of proprietary data, even without plugins, people building these plugins would start selling them directly to enterprises. This would make the LLMs independently irrelevant. 

For example, a company that builds a plugin to access financial data from a website would start selling the plugin to be linked to any LLM. This is why the companies need to buy exclusive access to intellectual property data, through which they would be able to train knowledgeable chatbots that companies can use as it is.

Moreover, this would also allow companies to use specific use case generative models that are combined with LLMs to generate better-focused responses. If specific use case-based chatbots are built, they would be ideal for individual companies. These models, when fed with proprietary data, would be able to generate responses as intended by the users in the way current LLMs do. 

Who Would Win 

Elon Musk is in a bid to build his own rival to ChatGPT. Though there hasn’t been much information lately about Musk’s intentions with generative AI, he has roped in many researchers to build something related to the trending technology. Interestingly, Musk has something that no other company has – Twitter data – that is actually a gold mine. 

OpenAI had access to this data before Musk realised this upon becoming the CEO of the company. He pulled the data away from OpenAI and plans to file a lawsuit against the company for using the data for building ChatGPT. Now he is building an all-encompassing shell firm called X Corp. 

There is a high possibility that Musk’s “based” bot would be the one that would be truly “knowledgeable”. Cuban predicts that Musk’s TruthGPT, would be ahead of Google, Meta, and Microsoft’s offerings and is also expected to be open source. “He can weigh his own tweets and those of the sources he likes and end up with a consumer facing AI that can be a virtual Elon. Pretty cool. Pretty scary,” said Cuban. 

The scary part about building an AI model based on Twitter is the amount of fine-tuning it requires to filter out the irrelevant, and in some cases dangerous opinions of people. 

On the other hand, if Musk does not build a chatbot built on his “intellectual property data”, someone else would. When Bard was released, several people prompted it about its dataset, to which it replied that it is trained on Gmail data.  Google was quick to respond that Bard is just hallucinating, but this still brings in the question about privacy around the chatbot. Moreover, is Gmail data actually an intellectual property of Google. 

The same is the case with Microsoft and OpenAI’s ChatGPT. Musk would be able to avoid any legal ramifications if he uses Twitter data to train his AI model, unlike OpenAI, which is being sued and blamed for training on private data. 

Twitter’s privacy policies clearly stateBy publicly posting content, you are directing us to disclose that [shared] information as broadly as possible, including through our APIs, and directing those accessing the information through our APIs to do the same.

This might be concerning for people who do not want an AI to be trained on their data, but the truth is that by posting on Twitter, users have agreed to the privacy policy. Looks like Musk bought the $44-billion Twitter just for the data. Now he has the sole right to build a chatbot using social media data. 

This might also be the chance for either Google or OpenAI to build their moat with intellectual property data, instead of publicly available data.

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.