GPT-3 prompts: Technical progress or just AI alchemy?

Frankly, calling this engineering is offending to engineers. It's chaos alchemy.
Listen to this story

Patrick Krauss, professor at Friedrich-Alexander-University Erlangen-Nuremberg (FAU), has called out a paper: “Large Language Models are Zero-Shot Reasoners” on Twitter. The paper claimed prompts increase the accuracy of GPT-3. 

The chain of thought (CoT) prompting, a technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved state-of-the-art performances in arithmetics and symbolic reasoning, the paper claimed. “We create large black boxes and test them with more or less meaningless sentences in order to increase their accuracy. Where is the scientific rigor? It’s AI alchemy! What about explainable AI?” Patrick said.

With 58 papers on LLM published on alone in 2022 and the global NLP market projected to reach USD 35.1 billion by 2026, LLMs are one of the thriving areas of research.


Sign up for your weekly dose of what's up in emerging technology.

Chain of Thought prompting

The idea was proposed in the paper, “Chain of Thought Prompting Elicits Reasoning in Large Language Models”. The researchers from Google Brain team utilised a chain of thought prompting – a coherent series of intermediate reasoning steps that lead to the final answer for a problem, to improve the decision making capability of the large language model. They demonstrated that sufficiently large language models could generate chains of thought if demonstrations of the chain of thought reasoning are provided in the exemplars for few-shot prompting.


Download our Mobile App

To test their hypothesis, the researchers used 3 transformer-based language models: GPT-3 ( Generative Pre-trained Transformer), PaLM(Pathways Language Model) and LaMDA (Language Model for Dialogue Applications). The researchers explored the chain of thought prompting for various language models on multiple benchmarks. Chain of thought prompting outperformed standard prompting for various annotators and different exemplars.

Zero-Shot COT

Researchers from the University of Tokyo and Google Brain team, improved on the chain of thought prompt method by introducing Zero-shot-COT (chain of thought). LLMs become decent zero-shot reasoners with a simple prompt, the paper claimed.


The results were demonstrated by comparing the performances on two arithmetic reasoning benchmarks (MultiArith and GSM8K) across Zero-shot-CoT and baselines. 

AI alchemy

Patrick’s tweet sparked a huge debate. “It is an empirical result, which adds to our understanding of these black boxes. Empiricism is a standard, well established approach in science, and I find it surprising this is new to you.” @Dambski further states that this discussion is subjective to what one considers to be the definition of understanding. Anything that increases the chances of the model correctly predicting how it will behave for a given input increases the understanding of that system, whether It can be explained or not,” said Twitter handle @Dambski.

Rolan Szabo, a machine learning consultant from Romania,, gave another analogy: “From a theoretical perspective, I understand the disappointment. But from a pragmatic perspective, Github Copilot writes the boring boilerplate code for me today, even if I don’t understand how exactly it conjures it up.”

Many supported Patrick’s statement. Piotr Turek, head of engineering, OLX group said: “Frankly, calling this engineering is offending to engineers. It’s chaos alchemy”

Soma Dhavala, principal researcher at Wadhwani AI, said: While we think we solved one problem — we made it somebody else’s problem or problem resurfaces in a different avatar. Case-in-point: With DL we don’t need feature engineering, was the claim.  Well yah, but we got to do architecture engineering.”

Guillermo R Simari, a professor emeritus in Logic for Computer Science and Artificial Intelligence, said: “I’d not be entirely against the approach. My concern is: What will we’ve learned about the thinking process at the end? Will I understand the human mechanism better? Or have I just got something that “works”? Whatever that means…” To which, Patrick Krauss said that’s exactly his point.

The discussion took a turn when Andreas K Maier, a professor at Friedrich-Alexander-University Erlangen-Nuremberg (FAU), asked whether such large language models are available for public access so that one can actually observe what is happening in the latent space during inference. 

To this comment, Patrick said the unavailability of LLMs is exactly the problem. “One problem is of course that some of these models are only available as API. Without access to the actual system it might become something like AI Psychology, ” Andreas added. As of now, Meta AI’s Open Pretrained Transformer (OPT-175B), is the largest LLM with open access.

More Great AIM Stories

Kartik Wali
A writer by passion, Kartik strives to get a deep understanding of AI, Data analytics and its implementation on all walks of life. As a Senior Technology Journalist, Kartik looks forward to writing about the latest technological trends that transform the way of life!

AIM Upcoming Events

Regular Passes expire on 3rd Mar

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 17th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, Virtual
Deep Learning DevCon 2023
27 May, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox