MITB Banner

Google Introduces ChatGPT-like ChatBot for Healthcare

MedPaLM consists of six existing open-question answering datasets along with a new one called HealthSearchQA.

Share

AI in healthcare
Listen to this story

With the release of large language models like GPT-3 and PaLM, big techs have been experimenting with large language models for quite some time now. Recently, Google also joined the party in response to Open AI’s ChatGPT, called the MedPaLM, specifically for answering medical queries. 

Introducing MedPaLM

While ChatGPT seems to be all over the place with no real use cases, Google Research and DeepMind recently introduced MedPaLM, an open-sourced large language model for medical purposes. It is benchmarked on MultiMedQA, a newly introduced open-source medical question-answering benchmark. It combines HealthSearchQA, a new free-response dataset of medical questions sought online, with six existing open-question answering datasets covering professional medical exams, research, and consumer queries. The benchmark also incorporates methodology for evaluating human model responses along several axes, including factuality, precision, potential harm, and bias.

MedPaLM provides datasets for multiple-choice questions and for longer responses to questions posed by medical professionals and non-professionals. These comprise the clinical topics datasets for MedQA, MedMCQA, PubMedQA, LiveQA, MedicationQA, and MMLU. In addition, a new dataset of curated, frequently searched medical inquiries called HealthSearchQA was added to improve MultiMedQA. 

The HealthsearchQA dataset, which consists of 3375 frequently asked consumer questions, was curated using seed medical diagnoses and their related symptoms. All users who entered the seed phrases were shown the publicly available frequently asked questions that were retrieved using the seed data and created by a search engine.

PaLM to the Rescue 

The researchers developed this model on PaLM, a 540 billion parameter LLM, and its instruction-tuned variation Flan-PaLM to evaluate LLMs using MultiMedQA. 

Flan-PaLM achieves SOTA performance on MedQA, MedMCQA, PubMedQA, and MMLU clinical topics by combining few-shot, chain-of-thought (CoT), and self-consistency prompting techniques, frequently surpassing many strong LLM baselines by a large margin. FLAN-PaLM performs over 17% better on the MedQA dataset of USMLE questions than the prior SOTA. Human evaluation, though, identifies significant gaps in Flan-PaLM responses.

The resulting model that addresses this issue is Med-PaLM, which claims to perform well compared to Flan-PaLM but still needs to outperform a human medical expert’s judgment. 

For instance, a group of doctors determined that 92.6% of the Med-PaLM responses were on par with the clinician-generated answers (92.9%), whereas just 61.9% of the long-form Flan-PaLM answers were deemed to be in line with the scientific agreement. Furthermore, like Flan-PaLM, 5.8% of Med-PaLM answers were assessed as potentially contributing to negative consequences, comparable to clinician-generated answers (6.5%), while 29.7% of Flan-PaLM answers were.

Check out the full paper here

Google’s Healthcare Play

In the Google for India 2022 event, Google announced a collaboration with Apollo Hospitals in India to improve the use of deep learning models in x-rays and other diagnostic purposes. Google’s other health partnerships include Aravind Eye Care System, Ascension, Mayo Clinic, Rajavithi Hospital, Northwestern Medicine, Sankara Nethralaya, and Stanford Medicine, among others.

Google isn’t the first tech behemoth to venture into the AI-driven healthcare solution. Microsoft is also working closely with the OpenAI team to employ GPT-3 to facilitate collaboration between employees and clinicians and improve healthcare teams’ efficiency. 

In November 2022, Meta AI also introduced Galactica, the AI-generated programme that claimed it would support academic researchers by generating comprehensive literature reviews and Wiki entries on any subject; however, it failed due to unreliable results. 
Around the same time, Meta AI released CICERO by merging natural language processing and strategic reasoning. It is the first AI agent to perform at a human level in the complex natural language game “Diplomacy.” Playing against humans on the website, the AI agent showed off this SOTA performance by exceeding all other players’ average scores by more than two to one. Additionally, it was among the top 10% of players who participated in multiple games.

Share
Picture of Shritama Saha

Shritama Saha

Shritama (she/her) is a technology journalist at AIM who is passionate to explore the influence of AI on different domains including fashion, healthcare and banks.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.