Listen to this story
What takes a medical student close to four years and over two years of clinical rotations to clear the United States Medical Licensing Examination (USMLE), OpenAI’s ChatGPT has successfully cleared all three parts of the USMLE in a single go, as per the results of a new experiment.
The researchers said that “ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations.”
The paper also cites that “these results suggest that large language models (LLMs) may have the potential to assist with medical education, and potentially, clinical decision-making.”
The USMLE is a high-stakes, comprehensive three-step standardized testing program covering all topics in physicians’ knowledge spanning basic science, clinical reasoning, medical management, and bioethics. The difficulty and complexity of questions are highly standardized and regulated, making it an ideal input substrate for AI testing.
However, this is not the first time the chatbot has aced an examination. A few days back, professors at the University of Pennsylvania’s Wharton School of Business discovered that the ChatGPT could easily successfully complete examinations on a typical MBA core course, Operations Management.
As per a report in the Fortune, Professor Christian Terwiesch released a paper this week which evaluates ChatGPT’s performance on the Operations Management paper. According to him, the chatbot “does an amazing job at basic operations management and process analysis questions, including those based on case studies.”
Terwiesch further said that the chatbot had its fair share of shortcomings, wherein the AI bot failed to answer “more advanced process analysis questions.” The professor also noted that ChatGPT “would have received a B to B- grade on the exam.”
Most recently, researchers also tested GPT-3.5 using questions from the US Bar Exam. They predict that GPT-4 and similar models have the potential to pass the exam very soon. In addition, the researchers found that hyper parameter optimisation and prompt engineering positively impacted GPT-3.5’s zero-shot performance.