MITB Banner

Can GPT-4 be a Saviour in the Medical Field ?

GPT-4’s capabilities make it a suitable player to assist in healthcare. However, is it completely reliable?

Share

Listen to this story

While OpenAI capabilities have made its way into every domain possible, there’s one field where LLMs, if utilised correctly, can have the highest impact by directly affecting lives — the medical field. Earlier this year, ChatGPT had even cleared all three parts of the United States Medical Licensing Examination (USMLE) and we even saw how ChatGPT helped save a dog’s life through accurate medical diagnosis. However, we have not seen much practical applications in the medical field. Does GPT-4 capabilities make it a suitable player in the medical field?   

Massive Potential

A paper released by OpenAI and Microsoft on the Capabilities of GPT-4 on Medical Challenge Problems was released in March, this year. In this research, GPT-4 have shown impressive language understanding and generation abilities in medicine. The study evaluates GPT-4’s performance on medical competency exams and benchmark datasets, even though the model wasn’t specialised for medicine. 

The researchers assess GPT-4’s performance on official USMLE practice materials and MultiMedQA datasets. GPT-4 surpasses the USMLE passing score by over 20 points, outperforming previous models (including GPT-3.5) and even models fine-tuned for medical knowledge. Additionally, GPT-4 demonstrates improved probability calibration, implying that it’s better at predicting correct answers. The study also explores how GPT-4 can explain medical reasoning, customise explanations, and create hypothetical scenarios, showcasing its potential for medical education and practice. The findings highlight GPT-4’s capabilities while acknowledging challenges related to accuracy and safety in real-world applications. 

In comparison to its older models, GPT-4 has gotten much better when tested on official medical exams such as USMLE. GPT-4 improved by more than 30 percentage points when compared to GPT-3.5. While GPT-3.5 was getting close to this passing score (60% of multiple-choice questions to be correct), GPT-4 passed the score by a huge number. 

Alignment and Safety In Place 

When an earlier version of GPT-4, referred to as the base model, was compared with GPT-4, the former had slightly better performance by about 3-5% on some of the tests. This suggests that when the model was made safer and better at following instructions, it might have lost a bit of its raw performance. The researchers suggested that future work could focus on finding ways to balance accuracy and safety more effectively by refining the training process or by using specialised medical data. 

Where does Med-PaLM fit in? 

The above research did not compare GPT-4 with models such as Med-PaLM and Flan-PaLM 540B, as the models were not available for everyone to try at the time of study. 

Google recently launched their multimodal healthcare LLM with Med-PaLMM – a large multimodal generative model that encodes and interprets biomedical data. Its capabilities are far more advanced than GPT-4 considering how it can handle various types of medical data such as clinical language, medical images, genomics and even performs a wide range of tasks. The model can generalise to new medical tasks and perform multimodal reasoning without specific training. It is able to precisely recognize and explain medical conditions in images using just instructions and prompts given in language. 

Never Fool-Proof

However, GPT-4 applications are not as diverse as the ones Med-PaLM offers. Though GPT-4 was announced with multimodal features, it is not yet available for users. Furthermore, there have been negative observations on GPT-4’s capabilities in medical diagnosis. Problematic and biased results were part of the outcome, and concerns on how GPT-4’s inclination to embed societal biases may hamper its suitability for aiding clinical decisions. 

The prevalent problem of hallucinations still persists with GPT-4 spewing incorrect information. The model has been generating incorrect answers for medical citations. GPT-4 produced over 20% errors for medical citations.  

While GPT-4 might not be completely reliable as a medical assist for diagnosis with the current performance , there are other functions that the model can assist in. Hospitals are looking at AI to help relieve doctor burnout. With applications that can write notes for electronic health records and drafting empathetic notes to patients, AI can help smoothen the process. Transcribing doctor and patient comments, then creating physician’s summary format for electronic health records is one of the best use cases in the medical field. With the current limitations, GPT-4 still has a long way to go before it can be entirely adopted in the medical field. 

Share
Picture of Vandana Nair

Vandana Nair

As a rare blend of engineering, MBA, and journalism degree, Vandana Nair brings a unique combination of technical know-how, business acumen, and storytelling skills to the table. Her insatiable curiosity for all things startups, businesses, and AI technologies ensures that there's always a fresh and insightful perspective to her reporting.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.