MITB Banner

ChatGPT’s Game-Changing ‘Vision’

With OpenAI finally integrating image features, GPT-4V(ision) opens doors for use cases that span across domains – putting ChatGPT ahead in the multimodal race

Share

Listen to this story

The introduction of image functionality on GPT-4 has piqued the interest of ChatGPT users in the last couple of weeks with most of them already experimenting with the incredible features of GPT-4 V(ision). Be it reading, recognising images, answering specific queries, coding or designing a website, the multimodality that GPT-4V has brought is becoming a game-changer.

The versatility that this feature brings is set to further revolutionise the way various industries work. 

Multimodal Functionality 

A recent paper on preliminary explorations with GPT-4V by Microsoft researchers was released a few days ago. The paper analyses the latest model to understand large multimodal models (LMMs) and revolves around assessing and testing GPT-4V’s capabilities through a wide range of structured tasks. The paper emphasised the distinct ability of GPT-4V to understand visual cues sketched or placed on input images that opens up innovative human-computer interaction techniques such as visual referencing prompts. 

While some of the basic functions were tested with GPT-4V, a larger range of functions that have possible use cases in various industries, were listed out in the paper — a few predominant ones being in medical and insurance. 

Crucial Medical Field 

While there have been discussions on GPT-4’s capabilities in the medical field, the latest update has only cemented its future. The ability of the model to decipher and critically analyse images can help infer details from a scan or X-ray. Radiology will have the maximum use case. 

In the below example, GPT-4V has been fed a tooth X-ray and prompted with various questions. Interestingly, the chatbot has been careful to put a disclaimer at the start and not give conclusive results. 

Source: arxiv.org

Auto Insurance

The capabilities of GPT-4V was also tested to check if it fits in auto insurance. With a focus on car accident reporting, two aspects, namely, vehicle damage evaluation and insurance reporting (recognising vehicle information such as licence plate, model, etc.) were tested. While the model has been able to give a detailed explanation of the type and severity of a damage, it is not able to conclusively estimate the cost of any damage. This is probably where the limitations of the model arise. 

Coding Game Stepped Up

While coding has been facilitated with ChatGPT’s Code Interpreter, GPT-4V has advanced the chatbot’s coding capabilities. From inputting a basic drawing or scribbles from a whiteboard, the model is able to code a website/app with ease. The low code/no code option has only been further simplified for an end user, which raises the question on the fate of coding platforms. 

The CEO of HyperWriteAI, Matt Shumer, tweeted about a GPT-4V-powered frontend engineer agent.  By simply uploading an image design, the model is able to code, correct the rendered form and even refine code to improve design quality. 

All is Not Perfect

As impressive as the results have been, the model is still not 100% accurate. GPT-4V can generate errors when it comes to reading minute details or counting variables that are too similar, thereby mandating the need to cross-check before relying on it completely. 

Source: arxiv.org

Surpassing Other LMMs

A few months ago, when ChatGPT was compared with its closest rival Bard, the latter surpassed OpenAI’s chatbot on many counts. The multimodal features such as voice/image and web browsing were the key features Bard had to its credit. However, all of them are now addressed through ChatGPT. 

While GPT-4V may be versatile, accuracy remains a concern, which is also a problem with Bard. A few users found incorrect responses from both GPT-4V and Bard when given a prompt that requires strategic thinking in a Pac-Man game. 

Redemption of ChatGPT 

The multimodal feature of ChatGPT comes at a time when the chatbot had reportedly been witnessing a decline in users since July. As per recent reports, the website and mobile visits to ChatGPT decreased by 3.2% to 1.43 billion in August — a 10% drop from each of the previous months. With a series of product feature launches over the last couple of weeks, including voice integration, OpenAI is possibly looking at a redemption. 

Given the buzz created with people experimenting with GPT-4V, the chatbot might possibly see a spike in the next few months. Furthermore, with OpenAI DevDay just around the corner and the company making a few major announcements, the latest ‘vision’ function might probably be a teaser for what lies ahead for ChatGPT. 

On a lighter note, in addition to causing breakthroughs with a serious range of functions, GPT4V can probably help you save face too, in case you don’t understand a meme or joke: a probable win for all. 

Source: arxiv.org

Share
Picture of Vandana Nair

Vandana Nair

As a rare blend of engineering, MBA, and journalism degree, Vandana Nair brings a unique combination of technical know-how, business acumen, and storytelling skills to the table. Her insatiable curiosity for all things startups, businesses, and AI technologies ensures that there's always a fresh and insightful perspective to her reporting.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.