The prover uses a language model to find proofs of formal statements. Whenever a new proof is found, the prover uses this as training data which improves the neural network and enables it to constantly find solutions to more complex problems.
Formal mathematics involves two major challenges that make a simple application of reinforcement learning unlikely to succeed: Infinite action space and Lack of self-play.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
To overcome the infinite action space challenge, the natural theorem prover uses sampling actions from a language model as it searches for proof. Language models have the capacity to generate tactic calls and mathematical terms often required as arguments. To address the lack of self-play, the creators replaced unsupervised curriculum with an auxiliary set of problem statements of varying difficulty. The training procedure was able to solve a curriculum of increasingly difficult problems when provided with a varied set of difficulties in auxiliary problems.
Although, the neural theorem prover was able to generate exciting results proving the potential deep learning models capability of non-trivial mathematical reasoning when interacting with a formal system. But, the system is still far away from providing consistent results when compared with the performance of the best students who participate in these challenging olympiad problems.