MITB Banner

Behind ChatGPT’s Wisdom: 300 Bn Words, 570 GB Data

As ChatGPT continues to enthral the world, users share their experiences with the human-like chatbot whose responses have taken the internet by storm.

Share

Listen to this story

As ChatGPT continues to enthrall the world, with users sharing their experiences with the human-like chatbot whose responses have taken the internet by storm. This includes a host of tasks, ranging from solving mathematical problems to generating codes and writing essays. The chatbot has also been able to don the cap of a confidante who can even give suggestions for improving relationships, health tips and can even draft jokes for your next stand-up performance. 

Ever wondered how it is able to pull this off so seamlessly? The answer to this lies in its speed and understanding of complex topics.

Recently, OpenAI highlighted how ChatGPT actually works on its website. It said that ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response.

According to an article published on BBC Science Focus, the model was trained using databases from the internet that included a massive 570 GB of data sourced from books, wikipedia, research articles, webtexts, websites and other forms of content and writing on the net. Approximately 300 billion words were fed into the system.

Being a large language system, the model works on probability as a result of which it is able to predict the next word or prompt in a sentence. This was made possible as the model underwent a supervised testing phase.

The model was fed inputs like “Is tomato a fruit or a vegetable?” and the team feeding the inputs has the correct answer or output, which is also fed into the system. However, this does not guarantee a correct answer as it is based on the prompt or the nature of the query. If the model gets it wrong, the correct answer is fed back into the model thereby training it to the right responses and also helping it build on its knowledge bank.

It then goes through the next stage where it offers diverse responses and a human annotator ranks it from the most appropriate to wrong—training the system to compare.

The model is a step ahead from the other existing models as ChatGPT continues to learn and build on its knowledge, and understanding the nature of prompts and questions and then responding accordingly thereby enabling it to answer all possible questions.

Reinforcement learning to the rescue 

What sets this technology apart is that it continues to learn while guessing what the next word should be, constantly improving its understanding of prompts and questions to become the ultimate know-it-all. 

As it is trained using the reinforcement learning algorithm, the model is constantly learning and updating itself for appropriate responses based on the nature of prompts. ChatGPT can also play the role of say a smarter version of an autocomplete software where when you start typing a sentence—it predicts the next course action

Limitations 

The model, however, still fails on many fronts. The response to the prompt, for example, fails to produce the answer to how it relates to GANs, and needs more layers of verification to source information better. 

In addition, in its effort to be responsible and being aware of the potential of AI being manipulated to produce biased or harmful content, OpenAI has ensured that the Chatbot is trained in such biases and restricts its response to prompts that appear inappropriate.

As for the discussion around whether ChatGPT has the potential to replace developers on a Twitter thread, a Twitter user explains that while the model is capable of producing human-like text, it is still limited in its ability to understand and manipulate complex systems like a human developer. In addition, a language model like ChatGPT is not capable of independent thought or creativity, which are important skills for a developer to have. In short, while large language models like ChatGPT may be able to assist developers in certain tasks, they will not be able to replace them completely.

The trending discussion around ChatGPT has also not escaped even the Crypto community and was among the most trending topics. The hype around the chatbot in turn led to crypto punters buying tokens related to AI that led to token prices surging by up to 77% according to CoinGecko, a digital currency price and information data platform.

Among the tokens that benefited the most were DeepBrain Chain (DBC) that posted the most gains with a 76.7% jump in token price within a week of ChatGPT being launched followed by Numeraire (NMR), the largest AI token by market capitalisation, that witnessed its price increase by 54.5% in the same period, from $11.26 to $17.40. 

Share
Picture of Aparna Iyer

Aparna Iyer

Aparna Iyer has covered various sectors spanning education, wildlife, culture and law for close to a decade. She now writes on technology and is keen to unearth its capability for public good.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.