Last year, OpenAI introduced the then-largest neural network GPT-3 in a paper titled “Language Models are Few Shot Learners”. A state-of-the-art language model, GPT-3, comprises 175 billion parameters against 1.5 billion parameters of its predecessor GPT-2. GPT-3 defeated the Turing NLG model with 17 billion that previously held the record for “largest-ever”. The language model has been marvelled at, criticised even, subjected to intense scrutiny; it has found interesting new applications too.
All three models have been released within a gap of a year; GPT-1 was released in 2018, GPT-2 in 2019, and GPT-3 in 2020. If we go by this pattern, the release of GPT-4 might just be around the corner. Industry watchers believe that GPT-4 may be launched in early 2023.
Sign up for your weekly dose of what's up in emerging technology.
What to expect from GPT-4
Altman said in the interview that contrary to popular belief, GPT-4 will not be any bigger than GPT-3 but will use more compute resources. This is an interesting announcement considering the vocal voices against the perils of having large language models and how they disproportionately affect both the environment and the underrepresented communities. Altman said this would be achieved by working with all the different aspects of GPT, including data algorithms and fine-tuning.
He also said that the focus would be to get the most out of smaller models. Conventional wisdom says that the more parameters a model has, the more complex tasks it can achieve. Researchers have been increasingly speaking up about how the effectiveness of a model may not necessarily be as correlated with its size as believed. For instance, recently, a group of researchers from Google published a study showing that a model much smaller than GPT-3 — fine-tuned language net (FLAN) — delivered better results than the former by a large margin on a number of challenging benchmarks.
That said, OpenAI is still figuring out how to train a smaller model to perform certain tasks and think on very difficult problems.
Altman also said that GPT-4 would focus more on coding, that is, Codex (Codex is a descendant of GPT-3). It is worth noting that OpenAI recently released Codex through API in private beta. Codex is also the basis for GitHub Copilot. It understands more than a dozen languages and can also interpret simple commands in natural language and execute them on users’ behalf, allowing building a natural language interface to existing applications.
Coding seems to be another major application area for GPTs. One such example is Microsoft’s recently announced GPT-3 based assistive feature for the company’s PowerApps software that converts natural language into code snippets. With Altman’s recent statements, it is expected that OpenAI would be leveraging this capability more with the new instalment of GPT.
Apart from GPT-4, Altman also gave a sneak-peek into GPT-5. He said that GPT-5 might be able to pass the Turing test. However, he also said that it might not be worth the effort. For the uninitiated, the Turing test is a method for determining whether a system can think like a human being. This aligns with OpenAI’s endeavour to ultimately achieve artificial general intelligence (AGI), about which the research lab has been quite vocal.
Public Release Of DALL.E
Apart from GPT-4, another major point of discussion was DALL.E. Altman said that DALL.E would be publicly released. It is a 12 billion parameter version of GPT-3 that is trained to generate images from text caption. Released earlier this year, DALL.E uses a dataset of text-image pairs to perform diverse tasks like creating anthropomorphised versions of animals and inanimate objects, rendering text, applying transformations to images, and combining even unrelatable concepts.
Of course, what transpired during the brief session with Altman might be just a teaser to what could be coming in future. That said, these small revelations are interesting and exciting, leaving one guessing about how the upcoming GPT model would turn out to be, for better or worse.