Listen to this story
Rumour mills are abuzz from the much anticipated GPT-4, which is set to launch this week; everyone has made their predictions on what it looks like, except OpenAI. In last week’s ‘AI in focus’ digital event, Microsoft Germany’s CTO casually announced that GPT-4 would release in the coming week and mentioned that “multimodality” will be a part of it. AI Breakfast believes that the Microsoft event ‘Reinventing productivity with AI’, which is to be hosted by Satya Nadella and scheduled for March 16, will probably be the platform for GPT-4’s launch.
GPT-4 has always been a hush-hush affair. In an interview in January, when Sam Altman was asked about GPT-4 launch in the first quarter, he replied, “It’ll come out at some point, when we are confident we can do it safely and responsibly”.
Parameters for GPT-4
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
It has been rumoured that GPT-4 has 100 trillion parameters with GPT-3 having 175 billion parameters. This widely shared comparison has been called “complete bullshit” by Sam Altman. He even ruled out the possibility of Artificial General Intelligence which was “sort of what’s expected of us”.
A Microsoft executive recently said that GPT-4 will allow text to video conversion, a feature which is already existing in Google and Meta AI. As opposed to GPT-3 which works on only text input, GPT-4 is expected to have multimodality. Multimodality will allow integration of audio, images and videos which will allow a wide range of input mechanisms for users. A user can probably ask for a description, analysis, and more from the input prompt that can take any form.
The way towards multimodality has been evident with Microsoft’s ambition towards integrating visual platforms to their natural language generation foundation. The recent paper on Visual ChatGPT explains how ‘prompt manager’ is used for sharing information between the various foundation models such as Stable Diffusion, ControlNET, BLIP and ChatGPT.
Microsoft also released a research paper on a multimodal large language model (MLLM) called Kosmos-1. The paper emphasises the integration of language, action and multimodal perception.
Source: Microsoft Research Paper
Multimodality allows a larger and smoother functionality for AI interaction. As of now, image generation platforms are being integrated into GPT-3 platforms but with GPT-4, if other modalities such as video and voice are also integrated, the usage of these platforms will become holistic and provide a more comprehensive understanding of any subject or problem.
Businessman and entrepreneur, Dinis Guarda predicts that GPT-4 will see a lot of image and video generation with advanced language processing technology and advanced versatility and adaptability. There would also be an uptick in GPT ‘chat personalities’ through GPT jailbreaks.
Layer-2 blockchain ‘CryptoGPT’, which launched its native crypto token ‘GPT’ on March 10, 2023, saw prices soaring probably in anticipation of GPT-4 launch.
What do experts think?
Large language models like GPT-4 will not reduce the need for data scientists but rather make their roles “more realistic”. Data Science influencer, Vin Vashishta believes that GPT-4 will be good for data science and expand the scope of work for data scientists. Instead of businesses working on developing large datasets and training learning models where costs scale faster than returns, partnering with startups or hyperscalers for large language models makes more sense. This is where the scope of data scientists expands. They can focus on practical, applied research and create small, highly reliable models that can be monetised effectively. They can focus on “applied research versus deep learning research”.
AI critic Gary Marcus, known for throwing caution to the wind against the rise of GPT models, continues to be sceptical about GPT-4. He believes that GPT-4 will not have any “reliable models” and that it can be used for downstream programs owing to its unpredictable model. Hallucinations, a problem with large language models, will continue to escalate.
Marcus considers the tool to be only good for “brainstorming and first drafts” and never as trustworthy general intelligence. If AGI comes, it will come only from systems that are “structured with more built-in knowledge” which are equipped with tools that have reasoning and planning capabilities. GPT systems lack this.
In spite of all the hype around GPT-4, OpenAI has been mellow on their take regarding the launch. In addition to Sam Altman’s response, OpenAI’s CTO Mira Murati, said in a recent interview that “less hype would be good”. With executives being on the defensive regarding GPT-4 launch and people building hype on the other, waiting for another few days will probably seal the story.