Do Large Machine Learning Models Struggle At Maths?

Published on March 19, 2021
by Shraddha Goled

In 1960, Nobel Laureate and American physicist Eugene Wigner wrote about the ‘unreasonable effectiveness of mathematics in natural sciences’. Mathematics is called the language of nature for a reason. That’s why the ‘Is math invented or discovered?’ debate never gets old. Mathematics exerts its influence on literally every field.

Mathematics is also the building block of machine learning models. ML practitioners use mathematics to analyse a problem, pick out better heuristics, and club both to generate an answer. Despite the critical role mathematics plays in machine learning, even state-of-art models struggle at maths.

A new study by the researchers at the University of California, Berkeley, have now introduced the MATH dataset. The team said the dataset provides a detailed assessment of a model’s mathematical ability across difficulties and subjects.

What Is MATH Dataset?

The MATH dataset consists of 12,500 problems taken from various high school mathematics competitions. The dataset measures the problem-solving ability of large and general-purpose language models. A machine learning model generates a sequence for a given problem from the MATH dataset and encodes the final answer.

MATH problems are labelled from 1 to 5, depending on the difficulty level and span across seven subjects, including geometry, number theory, algebra calculus, statistics, and linear algebra. For problems of geometry, diagrams can be specified with the Asymptote language.

Since step-by-step solutions also accompany the problems, language models can learn to answer questions they haven’t been exposed to before. The step-by-step approach allows models to perform intermediate computations instead of giving the final answer immediately.

Recognising the need to train the model on maths fundamentals before exposing to MATH that cover advanced problem-solving techniques, the team also released the Auxiliary Mathematics Problems and Solutions (AMPS). The ‘pretraining corpus’ has over 100,000 problems from Khan Academy with solutions and 5 million problems, based on 100 hand-designed modules, generated using Mathematica scripts.

Results

When the MATH dataset was tested for large language models, including GPT-3, the accuracies were found to be abysmally low, ranging from 2.9 percent to 6.9 percent. However, on the flip side, the models achieved up to 15 percent accuracy on the easiest level. When evaluated on humans, a PhD student with no specialisation in Mathematics attained 40 percent, while a three-time Olympiad gold medalist scored 90 percent.

Further, having the models generate a step-by-step solution before producing the final answer reduced accuracy. This was because, while many of these steps were related to the question, they were not logical.

The researchers found simply increasing the amount of training time, and the parameters proved extremely costly, although they did improve performance in a few cases. The researchers have open-sourced both MATH and AMPS to encourage and facilitate further research in this direction.

Predecessors

OpenAI recently introduced GPT-f, an automated prover and proof assistant for the Metamath formalisation language. Metamath is a language that expresses theorems in abstract mathematics along with proofs that a computer program can validate.

Last year, Facebook built an AI system that can solve complex mathematical problems using symbolic reasoning. The team gave a system to represent mathematical expressions as a language and then treating the solutions as a translation problem for sequence-to-sequence neural networks.

Wrapping Up

While most other text-based tasks are already nearly solved by enormous Transformers, MATH is notably different. We showed that accuracy is slowly increasing and, if trends continue, the community will need to discover conceptual and algorithmic breakthroughs to attain strong performance on MATH, the researchers stated.

Access all our open Survey & Awards Nomination forms in one place >>

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Watch More

Do Large Machine Learning Models Struggle At Maths?

What Is MATH Dataset?

Results

Predecessors

Wrapping Up

Shraddha Goled

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.