Member-only story
Mathematics
OpenAI’s Approach to Solve Math Word Problems
A new research paper and dataset look to make progress in one of the toughest areas of deep learning.
Yesterday’s edition of The Sequence highlighted OpenAI latest research to solve math word problems. Today, I would like to dive a bit deeper into the ideas behind this new research.
Mathematical reasoning has long been considered one of the cornerstones of human cognition and one of the main bars to measure the “intelligence” of language models. Take the following problem:
“Anthony had 50 pencils. He gave 1/2 of his pencils to Brandon, and he gave 3/5 of the remaining pencils to Charlie. He kept the remaining pencils. How many pencils did Anthony keep?”
Yes, the solution is 10 pencils but that’s not the point 😉. Solving this problem does not only entail reasoning through the text but also orchestrating a sequence of steps to arrive at the solution. This dependency on language interpretability as well as the vulnerability to errors in the sequence of steps represents the two major challenges when building ML models that can solve math word problems. Recently, OpenAI published new research proposing an interesting method to tackle this type of problem.
By now you must be thinking that the OpenAI method relies on GPT-3 given that the pre-trained model has been at the center of most of their progress in different areas of natural language processing. The assumption is directionally correct. OpenAI models are based on GPT-3 for language understanding and use two fundamental methods to optimize its capabilities for math problems.
1) Finetuning: The most obvious approach to optimize GPT-3 for math word problems is using a fine-tuning method. In essence, the fine-tuning approach autoregressively sampling a solution and checking whether that solution is correct.
2) Verification: This approach uses trained verifiers that can sample multiple solutions and assign a probability score to each one finally outputting the solution with the highest score. The verification pipeline can sample up to 100 candidate solutions for a given problem.