Going by the release cycle of the GPT franchise, the launch of the fourth generation is imminent, if not overdue. Last year, Sam Altman, the CEO of OpenAI, in a Q&A session at AC10 online meetup, spoke about the impending GPT-4 release. The release is probably on tap for July-August this year. However, OpenAI has kept a tight lid on the release date, and there is no definitive information available in the public domain on the same. But, one thing is for sure: GPT-4 will not have 100 trillion parameters.
GPT-3, released in May 2020, has 175 billion parameters. The third generation in the GPT-n series uses deep learning to produce human-like text. On September 22, 2020, Microsoft licensed the exclusive use of GPT-3. Based on the available information and Sam Altman’s statements at the Q&A session, we have compiled a list of improvements to expect in GPT-4.
Size doesn’t matter
Large language models like GPT-3 have achieved outstanding results without much model parameter updating. Though GPT-4 is most likely to be bigger than GPT-3 in terms of parameters, Sam Altman has clarified that size won’t be the differentiator for the next generation of OpenAI’s autoregressive language model. The parameter figures are likely to fall between GPT-3 and Gopher; between 175 billion-280 billion.
NVIDIA and Microsoft’s love-child Megatron-Turing NLG held the title of the largest dense neural network at 530 billion parameters (roughly 3x GPT-3) until Google’s PaLM (540 billion parameters) took the cake. Interestingly, smaller models such as Gopher (280 billion parameters) and Chinchilla (70 billion parameters) have outperformed MT-NLG across several benchmarks.
In 2020, OpenAI’s Jared Kaplan and the team claimed performance improved with the number of parameters. The PaLM model showed performance improvements from scale have not yet plateaued. However, Sam Altman has hinted that OpenAI is taking a different approach. He said OpenAI would no longer focus on making extremely large models but rather on getting the most out of smaller models. The AI research lab will look at other aspects — such as data, algorithms, parameterisation, or alignment — to bring significant improvements.
GPT-4 – a text-only model
Multimodal models are the deep learning models of the future. Because we live in a multimodal world, our brains are multisensory. Perceiving the world in only one mode at a time severely limits AI’s ability to navigate and comprehend it. Making GPT-4 a text-only model could be an attempt to push language models to their limits, adjusting parameters like model and dataset size before moving on to the next generation of multimodal AI.
Sparse models that use conditional computation in different parts of the model to process different inputs have been successful. Such models scale easily beyond the 1 trillion parameter mark without incurring high computing costs. However, the benefits of MoE approaches taper off on very large models. GPT-4, like GPT-2 and GPT-3, will be a dense model. In other words, all parameters will be used to process any given input.
Assuming that GPT-4 could be larger than GPT-3, the number of training tokens required to be compute-optimal (according to DeepMind’s findings) could be around 5 trillion– an order of magnitude greater than current datasets. The number of FLOPs required to train the model to achieve minimal training loss would be 10–20x that of GPT-3. In the Q&A, Altman has said GPT-4 would require more computing than GPT-3. OpenAI will focus on optimising variables than scaling the model.
The OpenAI’s north star is a beneficial AGI. The OpenAI is likely to build on the learnings from InstructGPT models, which are trained with humans in the loop. InstructGPT was deployed as the default language model on OpenAI’s API and is much better at following user intentions than GPT-3 while also making them more truthful and less toxic, using techniques developed through their alignment research. However, the alignment was limited to OpenAI employees and English-speaking labellers. GPT-4 is likely to be more aligned with humans compared to GPT-3.