Listen to this story
|
The introduction of image functionality on GPT-4 has piqued the interest of ChatGPT users in the last couple of weeks with most of them already experimenting with the incredible features of GPT-4 V(ision). Be it reading, recognising images, answering specific queries, coding or designing a website, the multimodality that GPT-4V has brought is becoming a game-changer.
The versatility that this feature brings is set to further revolutionise the way various industries work.
Multimodal Functionality
A recent paper on preliminary explorations with GPT-4V by Microsoft researchers was released a few days ago. The paper analyses the latest model to understand large multimodal models (LMMs) and revolves around assessing and testing GPT-4V’s capabilities through a wide range of structured tasks. The paper emphasised the distinct ability of GPT-4V to understand visual cues sketched or placed on input images that opens up innovative human-computer interaction techniques such as visual referencing prompts.
While some of the basic functions were tested with GPT-4V, a larger range of functions that have possible use cases in various industries, were listed out in the paper — a few predominant ones being in medical and insurance.
Crucial Medical Field
While there have been discussions on GPT-4’s capabilities in the medical field, the latest update has only cemented its future. The ability of the model to decipher and critically analyse images can help infer details from a scan or X-ray. Radiology will have the maximum use case.
In the below example, GPT-4V has been fed a tooth X-ray and prompted with various questions. Interestingly, the chatbot has been careful to put a disclaimer at the start and not give conclusive results.
Source: arxiv.org
Auto Insurance
The capabilities of GPT-4V was also tested to check if it fits in auto insurance. With a focus on car accident reporting, two aspects, namely, vehicle damage evaluation and insurance reporting (recognising vehicle information such as licence plate, model, etc.) were tested. While the model has been able to give a detailed explanation of the type and severity of a damage, it is not able to conclusively estimate the cost of any damage. This is probably where the limitations of the model arise.
Coding Game Stepped Up
While coding has been facilitated with ChatGPT’s Code Interpreter, GPT-4V has advanced the chatbot’s coding capabilities. From inputting a basic drawing or scribbles from a whiteboard, the model is able to code a website/app with ease. The low code/no code option has only been further simplified for an end user, which raises the question on the fate of coding platforms.
The CEO of HyperWriteAI, Matt Shumer, tweeted about a GPT-4V-powered frontend engineer agent. By simply uploading an image design, the model is able to code, correct the rendered form and even refine code to improve design quality.
All is Not Perfect
As impressive as the results have been, the model is still not 100% accurate. GPT-4V can generate errors when it comes to reading minute details or counting variables that are too similar, thereby mandating the need to cross-check before relying on it completely.
Source: arxiv.org
Surpassing Other LMMs
A few months ago, when ChatGPT was compared with its closest rival Bard, the latter surpassed OpenAI’s chatbot on many counts. The multimodal features such as voice/image and web browsing were the key features Bard had to its credit. However, all of them are now addressed through ChatGPT.
While GPT-4V may be versatile, accuracy remains a concern, which is also a problem with Bard. A few users found incorrect responses from both GPT-4V and Bard when given a prompt that requires strategic thinking in a Pac-Man game.
Redemption of ChatGPT
The multimodal feature of ChatGPT comes at a time when the chatbot had reportedly been witnessing a decline in users since July. As per recent reports, the website and mobile visits to ChatGPT decreased by 3.2% to 1.43 billion in August — a 10% drop from each of the previous months. With a series of product feature launches over the last couple of weeks, including voice integration, OpenAI is possibly looking at a redemption.
Given the buzz created with people experimenting with GPT-4V, the chatbot might possibly see a spike in the next few months. Furthermore, with OpenAI DevDay just around the corner and the company making a few major announcements, the latest ‘vision’ function might probably be a teaser for what lies ahead for ChatGPT.
On a lighter note, in addition to causing breakthroughs with a serious range of functions, GPT4V can probably help you save face too, in case you don’t understand a meme or joke: a probable win for all.
Source: arxiv.org