The cat’s finally out of the bag. GPT-4 is here and has got the world busy. AIM published GPT-4 predictions hours before OpenAI’s surprise launch. While they might have given a GPT-4 live demo for developers, it wasn’t concrete in addressing some of the critical features everyone was anticipating.
The biggest offering of GPT-4, as predicted, is its multimodality model where it is capable of processing image and text inputs to produce text output. The feature will supposedly find use in dialogue systems, text summarisation and machine translation. However, OpenAI did not talk about the parameters and capacity of GPT-4.
Multimodality
The biggest prediction of multimodality was partially addressed with the integration of images. In the Microsoft Germany event last week, when CTO Andreas Braun announced the possibility of multimodality in GPT-4, the integration of image, video, audio and many more features seemed like a possibility. However, the GPT-4 developer demo only showcased image integration.
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Greg Brockman, President and Co-Founder of OpenAI, explained that the image feature in GPT-4 is in preview mode and merely a “sneak-peak”. He further added that it is not yet publicly available and that they are still partnering with ‘Be My Eyes’, a startup that works towards creating technology to help people who are blind or have low vision.
In the demo, GPT-4 was able to logically describe an image, such as “Why is this image funny?”, a feature that was proposed in Microsoft’s Kosmos-1 where multimodality is used to analyse images and give output. GPT-4 can understand images and express logical ideas about them.
Download our Mobile App
GPT-4 is also equipped to read hand-written messages with specific instructions and convert them to the required output.
Hand-drawn pencil drawing -> website (https://t.co/4kexpvYAgV).
— Greg Brockman (@gdb) March 15, 2023
Prompt: "Write brief HTML/JS to turn this mock-up into a colorful website, where the jokes are replaced by two real jokes." https://t.co/zQ4smwqGVo pic.twitter.com/cunT74HO5l
Parameters for GPT-4
OpenAI did not talk about the parameters GPT-4 is trained on, leaving the big prediction of whether GPT-4 is trained on 100 trillion parameters—as rumoured—unresolved. Though the question was refuted by Sam Altman in an interview in January, OpenAI did not confirm the same. OpenAI also did not talk about the costs or the kind of technical support it utilised in order to build GPT-4.
OpenAI, however, spoke at length about the advanced text feature in GPT-4 which necessarily means that more parameters were employed to train the new model. GPT-4 can read, analyse and generate up to 25000 words of text which is “8 times more than ChatGPT”. In addition, it can even write code in all major languages. The constant comparison to their GPT-3 model was nearly like an affirmation of how this model is better than ChatGPT.
Hallucinations
Predictions on rising hallucinations of LLMs had been mentioned by AI experts; the risk being notably higher with GPT-4. Gary Marcus had also mentioned how training large sets of data will bring more hallucinations to the fore. However, Sam Altman debunked the prediction. Altman mentioned that GPT-4 will hallucinate “significantly less” and will be “less biased”, however no clarity on how that will materialise was offered. With Brockman emphasising on how OpenAI will continuously work to “make the system work faster”, the claim of fewer hallucinations can only be confirmed with time.
GPT-4 much larger than GPT-3
In November 2022, AIM had spoken about how GPT-4 will be far bigger than GPT-3 and perform tasks that GPT-3 can’t. In the developer demo video, Brockman details tasks that were previously not possible with GPT-3. He emphasises on “how to work with the system to accomplish a task that none of us like to do but have to” and goes on to explain how GPT-4 can help with your “taxes”.
With focus on GPT-4 offering much more than its predecessor, OpenAI seemed to focus on acquiring new users as they kept mentioning how the new model had been tested for months to “make it suitable for society” and “add value to everyday life”. It was earlier mentioned that there would be more platform integration with LLMs and GPT-4’s announcement led to collaboration announcements. Focusing on education and passing online exams, GPT-4 aimed to reach the “teaching segment”. This was evident in the announcements by online education platforms like Khan Academy and Duolingo that came around the time of the GPT-4 launch event.
While broad GPT-4 predictions did come true, lack of clarity from OpenAI has curbed us from concluding the exact magnitude of GPT-4. With time and further adoption, use cases will be the only confirming factor in understanding how much of their claims stand true.