MITB Banner

Google Teaches ChatGPT How to Solve Math Problems

The Google study tries to tackle language model's maths issue with in-context learning approach

Share

Google Chrome News
Listen to this story

OpenAI chatbot ChatGPT excels at a plethora of tasks, like script writing, explaining complex topics, debugging, code explaining and others, but performs poorly when it comes to maths. 

Stanford University and University of California, Berkeley, recently published a research paper that stated that large language models (LLMs) can perform simple maths operations when numbers are small, but struggle with large numbers suggesting that LLMs have not learned the underlying rules needed to perform these arithmetic operations. It further mentioned, even with GPT-4 improvements on the MATHS dataset, errors largely occur due to arithmetic and calculation mistakes

The rival company, Google, has acknowledged the issue in LLMs and stepped in to teach models like ChatGPT to reason better algorithmically. The work by Google researchers titled, ‘Teaching language models to reason algorithmically’, takes the in-context learning approach and introduces an algorithm better at reasoning. 

In-context learning is teaching a new skill where the researchers guide it through the process step-by-step instead of overwhelming it with all instructions upfront. The method refers to a model’s ability to perform a task after seeing a few examples of it within the context of the model. 

They also presented a prompting technique for general purpose language models to have strong generalisation on maths problems more difficult than the ones in prompt. The technique builds upon other rationale-augmented approaches (e.g., scratchpad and chain-of-thought). Lastly, they demonstrated that a model can ‘reliably execute algorithms on out-of-distribution examples with the right prompts’.

A Below Average Student

ChatGPT has become worse at performing certain basic maths operations — as it is getting better at other things. The same study highlighted that the high-profile chatbot is getting worse as compared to its performance earlier in March. 

Researchers said the deterioration is due to an AI phenomenon known as drift, where attempts to improve one part of complex models make other parts of the models worse.

To track performance, James Zou, Stanford professor affiliated with the school’s AI lab and his colleagues, Matei Zaharia and Lingjiao Chen fed ChatGPT 1,000 different numbers. In March, the paid GPT-4 version impressively identified whether 84% of the numbers were prime or not. By June, the success rate dropped to 51%.

Apart from getting the answers wrong, ChatGPT also got a thumbs down for its attempt to show the researchers how it arrived at certain conclusions. As part of the research, the researchers additionally asked the chatbot to lay out its “chain of thought”, the term for when a chatbot explains its reasoning. In March, it did so, but by June it stopped showing its step-by-step reasoning.

The recent Google study tries to tackle this issue with its in-context learning approach. These discoveries suggest that exploring longer contexts, and prompting more informative explanations could provide valuable research.

The Wolfram Headmasters

A pioneer in fusing technology with maths education, Wolfram Research has been working with ChatGPT’s parent OpenAI, to bring better maths capabilities in AI models. ”We have seen some interesting results with our LLM. I tried to run a British ‘A’ level maths, an exam students take before University, and ChatGPT alone got 43% which is quite impressive, but Wolfram plus ChatGPT got 96%,” cofounder of the company, Conrad Wolfram revealed in an interview with AIM. 

“Game over for humans on that one,”  he quipped. 

Notably, when a same maths teaser was thrown at ChatGPT version 3.5, 4, and Wolfram Plugin — what is the smallest integer greater than 95,555, in which there will be 4 identical numbers? — only the latter got it right in the first attempt. 

The Wolfram + ChatGPT plugin not only solves maths step-by-step but it can also present them visually if specifically prompted to do so. Based on the prompts, it can go a step further and represent the data in different forms like graphs, charts, and histograms. 

The plugin can turn queries in natural language into beautiful mathematical equations. It can do so since it combines ChatGPT’s human mimicking technology and Wolfram’s strong foundation of symbolic programming language  that focuses on expressing ideas in a computational form. 

On one hand, Wolfram is making strides with its plug-in and on the other, researchers show models performance worsening. In the current landscape, Google’s latest in-context learning approach can help AI chatbots become an above-average student. 

Share
Picture of Tasmia Ansari

Tasmia Ansari

Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.