Listen to this story
Patrick Krauss, professor at Friedrich-Alexander-University Erlangen-Nuremberg (FAU), has called out a paper: “Large Language Models are Zero-Shot Reasoners” on Twitter. The paper claimed prompts increase the accuracy of GPT-3.
The chain of thought (CoT) prompting, a technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved state-of-the-art performances in arithmetics and symbolic reasoning, the paper claimed. “We create large black boxes and test them with more or less meaningless sentences in order to increase their accuracy. Where is the scientific rigor? It’s AI alchemy! What about explainable AI?” Patrick said.
Sign up for your weekly dose of what's up in emerging technology.
Chain of Thought prompting
The idea was proposed in the paper, “Chain of Thought Prompting Elicits Reasoning in Large Language Models”. The researchers from Google Brain team utilised a chain of thought prompting – a coherent series of intermediate reasoning steps that lead to the final answer for a problem, to improve the decision making capability of the large language model. They demonstrated that sufficiently large language models could generate chains of thought if demonstrations of the chain of thought reasoning are provided in the exemplars for few-shot prompting.
Download our Mobile App
To test their hypothesis, the researchers used 3 transformer-based language models: GPT-3 ( Generative Pre-trained Transformer), PaLM(Pathways Language Model) and LaMDA (Language Model for Dialogue Applications). The researchers explored the chain of thought prompting for various language models on multiple benchmarks. Chain of thought prompting outperformed standard prompting for various annotators and different exemplars.
Researchers from the University of Tokyo and Google Brain team, improved on the chain of thought prompt method by introducing Zero-shot-COT (chain of thought). LLMs become decent zero-shot reasoners with a simple prompt, the paper claimed.
The results were demonstrated by comparing the performances on two arithmetic reasoning benchmarks (MultiArith and GSM8K) across Zero-shot-CoT and baselines.
Patrick’s tweet sparked a huge debate. “It is an empirical result, which adds to our understanding of these black boxes. Empiricism is a standard, well established approach in science, and I find it surprising this is new to you.” @Dambski further states that this discussion is subjective to what one considers to be the definition of understanding. Anything that increases the chances of the model correctly predicting how it will behave for a given input increases the understanding of that system, whether It can be explained or not,” said Twitter handle @Dambski.
Rolan Szabo, a machine learning consultant from Romania,, gave another analogy: “From a theoretical perspective, I understand the disappointment. But from a pragmatic perspective, Github Copilot writes the boring boilerplate code for me today, even if I don’t understand how exactly it conjures it up.”
Many supported Patrick’s statement. Piotr Turek, head of engineering, OLX group said: “Frankly, calling this engineering is offending to engineers. It’s chaos alchemy”
Soma Dhavala, principal researcher at Wadhwani AI, said: While we think we solved one problem — we made it somebody else’s problem or problem resurfaces in a different avatar. Case-in-point: With DL we don’t need feature engineering, was the claim. Well yah, but we got to do architecture engineering.”
Guillermo R Simari, a professor emeritus in Logic for Computer Science and Artificial Intelligence, said: “I’d not be entirely against the approach. My concern is: What will we’ve learned about the thinking process at the end? Will I understand the human mechanism better? Or have I just got something that “works”? Whatever that means…” To which, Patrick Krauss said that’s exactly his point.
The discussion took a turn when Andreas K Maier, a professor at Friedrich-Alexander-University Erlangen-Nuremberg (FAU), asked whether such large language models are available for public access so that one can actually observe what is happening in the latent space during inference.
To this comment, Patrick said the unavailability of LLMs is exactly the problem. “One problem is of course that some of these models are only available as API. Without access to the actual system it might become something like AI Psychology, ” Andreas added. As of now, Meta AI’s Open Pretrained Transformer (OPT-175B), is the largest LLM with open access.