GPT-3 (Generative Pre-trained Transformer), for the uninitiated, is a language model, with the capability to generate amazing human-like text on demand, which has been the subject of a lot of discussions recently. It was released in May 2020 by OpenAI, a non-profit artificial intelligence research company backed by Peter Thiel, Elon Musk among others, and is the third generation of the model as the moniker ‘3’ suggests. GPT-3 was on 570GB worth of data crawled from the internet, including all of Wikipedia.
It is the largest known neural net created to date, and it is giving us some amazing results. Its basic capability is to generate text given limited context, and this ‘text’ can be anything that has a language structure – spanning essays, tweets, memos, translations and even computer code. GPT-3 is available as an API commercially and is reportedly generating 4.5 billion words a day currently (per The Verge) through a multitude of apps and applications that are using its capability very diversely. For a world at the top of its hype on artificial intelligence, GPT-3 has brought out ample excitement and seemingly enough anecdotal evidence to suggest that the singularity of a conscious AI is at hand and of course scarily ready to take over jobs and more from humanity.
The hype, as ever, obscures reality, and going through the fundamental principles of the technology will give us a better sense of its capabilities as well as its limitations. GPT-3 is basically a neural network based on a deep learning model, that is trained to ‘learn’ by using existing language samples crawled by bots. It is unique in its scale, its earlier version GPT-2 had the capacity of 1.5 billion parameters and the largest language model that Microsoft built preceding it, 17 billion parameters; both dwarfed by the 175 billion parameters capacity of GPT-3.
This scale gives it the ability to recreate text, or essentially predict the next word in succession, based on the ‘training’ that makes it eerily close to human language, given very little context. For those a little more inclined to the technical details, in statistics, there are two main approaches to classification, generative and discriminative. Discriminative algorithms try to learn the probability of the outcome from an observation, directly from the data, and then try to classify it. On the other hand, generative algorithms try to learn the joint occurrence of observation and outcome, which they then transform into a prediction of the outcome.
One of the obvious advantages of the generative approach is that we can use it to generate new data, similar to the existing data. GPT-3 takes the generative approach to a scale that the general information on the internet allows it to, and essentially uses the context provided to it, to predict the next word basis this ‘learning’. This process repeats to let it generate the next word onwards to a sentence, paragraph and beyond. It uses this same approach to generate code in languages like Python.
Arthur Clarke’s famous adage, “Any sufficiently advanced technology is indistinguishable from magic”, certainly seems to hold good in the context of GPT-3. Given its step gap from its predecessors, it seems miraculous, but a closer look gives you the sight of the cracks. Given that GPT-3 predicts based on published information on the internet that is rife with bias and inaccuracy, it is but natural that these issues will creep into its output as well. Multiple instances of the tendency of the system to devolve into statements of bias have been noted, and the necessity to detoxify the process, though spoken about, has been far from easy to accomplish. Another criticism of the model is that it is exceptionally compute heavy, outside the reach of smaller organizations, and unable to differentiate efforts based on the task on hand.
It is additionally a black box system making it less transparent for wider applications and has shown itself to be more effective with short texts, devolving into error as the size of the text it generates grows longer. The strongest criticism is of course the call out that while it is spewing text output, it does not have a model of the world to give it real understanding and context. This brings up the long-seated view in the AI circles, that while advances in narrow AI with deep learning are impactful, they are mere tools of perceptual classification and take attention away from the task of creating ‘general intelligence’ which has been nature’s approach to the solution, hugely more versatile and elegant.
Be that as it may, GPT-3 is a definite step forward in advancing the cause of AI and will for the time to come be seen as a relevant step change in the way natural language, long seen as the human bastion, is coming under significant attack. As ever, guarding ourselves against the hype, to cut through to the reality of the evolution in the field of AI that GPT-3 really represents, and working on solving the issues of narrow AI, while keeping our eyes on the real prize of general intelligence is a key perspective to have on the subject. This will let us see GPT-3 for what it really is, a significant advance in the field taking us a little closer to the ultimate goal of ‘general intelligence’ that is still some distance away from us.