Recently, researchers at OpenAI introduced GPT-F, an automated prover and proof assistant for the Metamath formalization language. In a paper named “Generative Language Modelling for Automated Theorem Proving”, the researchers said that a formal mathematics community approved a deep learning-based system’s proof for the first time.
Deep neural networks have made huge strides in language understanding, computer vision, speech recognition, image generation and more. However, the reasoning capabilities of the deep learning models fall short at times. To address the inadequacy, the researchers introduced GPT-F by applying a transformer language model to automated theorem proving.
Behind GPT-F
The researchers used the Metamath library as a formal environment as the features of Metamath allow faster prototyping and reduced iteration time in the near term. For the model architecture, the researchers used decoder-only Transformers similar to GPT-2 and GPT-3. The largest model they studied has 36 layers and 774 million trainable parameters.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Metamath is powered by a simple meta-logic system based on a single substitution rule. The researchers said the main Metamath library is called set.mm, a collection of around 38k proofs based on ZFC set theory.
The benefits of Metamath include:
- Verification is quick, and can be implemented in hundreds of lines of code
- The proof steps are context-free.
- Access to the clean and compact sub-goal representations makes searching the proof tree straightforward in a relative manner.
- set.mm is one of the largest libraries available and is compatible with modern mathematics.
“I’ve been using the tool for a while. The state of the mm database is that it has the knowledge of an advanced undergrad encoded into it, about 35k theorems so far, you can check it out here. The tool is then helpful for writing new proofs into the database. So it’s still quite a long way from the frontiers of mathematics. Most of the work at the moment is to encode more and more of what has already been proven before tackling unproven theorems. However, GPT-F has found new proofs, which are shorter and more elegant, for currently proven theorems. I would also say, from using the tool, that it’s better at proving things than I am, which is pretty cool,” said a Reddit user.
Contributions
- The researchers verified that generative pre-training substantially improves performance and that pre-training on mathematical data leads to better performance than pre-training on generic text from the web.
- They found the model size is positively correlated with performance, even though the size of the Metamath dataset is relatively small.
- The researchers also demonstrated that iteratively training a value function on statements generated by their language model leads to improvements in GPT-F’s performance.
- The best model closed 56.22% of proofs from a held-out test set and demonstrated that the Transformer architecture might be suitable for formal reasoning.
Read the paper here.