MITB Banner

GPT 3: Behind The Hype

Share

GPT-3 (Generative Pre-trained Transformer), for the uninitiated, is a language model, with the capability to generate amazing human-like text on demand, which has been the subject of a lot of discussions recently. It was released in May 2020 by OpenAI, a non-profit artificial intelligence research company backed by Peter Thiel, Elon Musk among others, and is the third generation of the model as the moniker ‘3’ suggests. GPT-3 was on 570GB worth of data crawled from the internet, including all of Wikipedia. 

It is the largest known neural net created to date, and it is giving us some amazing results. Its basic capability is to generate text given limited context, and this ‘text’ can be anything that has a language structure – spanning essays, tweets, memos, translations and even computer code. GPT-3 is available as an API commercially and is reportedly generating 4.5 billion words a day currently (per The Verge) through a multitude of apps and applications that are using its capability very diversely. For a world at the top of its hype on artificial intelligence, GPT-3 has brought out ample excitement and seemingly enough anecdotal evidence to suggest that the singularity of a conscious AI is at hand and of course scarily ready to take over jobs and more from humanity.

The hype, as ever, obscures reality, and going through the fundamental principles of the technology will give us a better sense of its capabilities as well as its limitations. GPT-3 is basically a neural network based on a deep learning model, that is trained to ‘learn’ by using existing language samples crawled by bots. It is unique in its scale, its earlier version GPT-2 had the capacity of 1.5 billion parameters and the largest language model that Microsoft built preceding it, 17 billion parameters; both dwarfed by the 175 billion parameters capacity of GPT-3. 

This scale gives it the ability to recreate text, or essentially predict the next word in succession, based on the ‘training’ that makes it eerily close to human language, given very little context. For those a little more inclined to the technical details, in statistics, there are two main approaches to classification, generative and discriminative. Discriminative algorithms try to learn the probability of the outcome from an observation, directly from the data, and then try to classify it. On the other hand, generative algorithms try to learn the joint occurrence of observation and outcome, which they then transform into a prediction of the outcome. 

One of the obvious advantages of the generative approach is that we can use it to generate new data, similar to the existing data. GPT-3 takes the generative approach to a scale that the general information on the internet allows it to, and essentially uses the context provided to it, to predict the next word basis this ‘learning’. This process repeats to let it generate the next word onwards to a sentence, paragraph and beyond. It uses this same approach to generate code in languages like Python. 

Arthur Clarke’s famous adage, “Any sufficiently advanced technology is indistinguishable from magic”, certainly seems to hold good in the context of GPT-3. Given its step gap from its predecessors, it seems miraculous, but a closer look gives you the sight of the cracks. Given that GPT-3 predicts based on published information on the internet that is rife with bias and inaccuracy, it is but natural that these issues will creep into its output as well. Multiple instances of the tendency of the system to devolve into statements of bias have been noted, and the necessity to detoxify the process, though spoken about, has been far from easy to accomplish. Another criticism of the model is that it is exceptionally compute heavy, outside the reach of smaller organizations, and unable to differentiate efforts based on the task on hand. 

It is additionally a black box system making it less transparent for wider applications and has shown itself to be more effective with short texts, devolving into error as the size of the text it generates grows longer. The strongest criticism is of course the call out that while it is spewing text output, it does not have a model of the world to give it real understanding and context. This brings up the long-seated view in the AI circles, that while advances in narrow AI with deep learning are impactful, they are mere tools of perceptual classification and take attention away from the task of creating ‘general intelligence’ which has been nature’s approach to the solution, hugely more versatile and elegant. 

Be that as it may, GPT-3 is a definite step forward in advancing the cause of AI and will for the time to come be seen as a relevant step change in the way natural language, long seen as the human bastion, is coming under significant attack. As ever, guarding ourselves against the hype, to cut through to the reality of the evolution in the field of AI that GPT-3 really represents, and working on solving the issues of narrow AI, while keeping our eyes on the real prize of general intelligence is a key perspective to have on the subject. This will let us see GPT-3 for what it really is, a significant advance in the field taking us a little closer to the ultimate goal of ‘general intelligence’ that is still some distance away from us.

Share
Picture of Mohan Jayaraman

Mohan Jayaraman

Mohan Jayaraman is the Managing Director for Southeast Asia and Regional Innovation at Experian. He leads the SEA business and heads up Experian’s innovation hub – Experian X Labs in the region. He holds the additional responsibility for the Analytics, Business information business line and technology for the APac region. Mohan is a seasoned senior executive in the data and financial services space and has managed multi-market and vertical responsibilities in his 10-year tenure at Experian. He is a data science, machine learning and technology enthusiast with considerable experience working in consumer banking as well as the B2B business space. He is contactable at mohan.jayaraman@experian.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.