Listen to this story
Large language models (LLM) cannot reason as humans do – this has been the resounding argument from thinkers across the board to downplay the progress in AI. But now, a new hypothesis has caught the attention of the scientific community, which is, LLMs can help create good theories of human cognition.
Only a few weeks ago, celebrity linguist Noam Chomsky reflected upon the “false promise” offered by models like ChatGPT, which far from reasoning, are only able to mimic some regularities in data. This was seconded by several AI scientists, including Gary Marcus, who said, “LLMs don’t reliably understand the world”. In this light, the argument that LLMs can teach us about how humans think seems counterintuitive.
The proposition was laid out by Steven Piantadosi, a professor at UC Berkeley, who argues that while Chomsky’s ideas are deeply compelling, they exist only at the abstract level. According to him, generative syntacticians like Chomsky have insulated themselves from “engineering, empirical tests, and formal comparisons”.
But, now engineering has taken over and modern language models have taken us by surprise. The models test the underlying assumptions and principles taken by theorists like Chomsky in their studies, who if anything didn’t bother to verify them by comparisons against other alternatives (such as neural networks). For example, parameter fitting in statistical models like large language models could also lead to theory building of language.
Parameter fitting involves searching for parameter values that best fit the data, in order to make predictions or draw inferences about the relationship between variables in a given model. Naturally, in the case of massively over-parameterised models, such as ChatGPT with its 175 billion parameters, there is a rich potential space for inferring hidden patterns and structures within the data.
“Large language models have attained remarkable success at discovering grammar without using any of the methods that some in linguistics insisted were necessary for a science of language to progress,” writes Piantadosi.
‘Scientific’ nature of language models
This idea of a language system underlying neural networks comes from a paper published by Jeffrey Elman in 1990. The modern language models integrate diverse computational approaches to language through built-in architectural principles, allowing them to emerge rather than being directly encoded. These models are able to do a variety of different functions, even more than what Elman had originally conceptualised. This includes the ability to make in-context word predictions (eg: it can retrieve the name “Alex” from dozens of words prior when producing a text) as well as maintaining the grammar and meaning of the sentence even while predicting the next word.
The reason it is able to do this is that these models have internal structures – such as the internal representational states and attentional patterns – which are able to break down sentences into constituent parts (eg: nouns, verbs, ad clauses) in a tree-like structure, with striking similarities to human-annotated parse trees. The degree to which a model is tree-structured can also indicate how well it performs on tasks it wasn’t explicitly trained on.
Piantadosi also notes that some models have an internal structure that makes them “spontaneously develop an intuitive pipeline”, starting with representing parts of speech, followed by parsing, semantic analysis and so on.
The research, therefore, challenges the notion of language being an innate ability of humans, pushing forth the idea that large language models are capable of developing representation of key structures and relationships within language. Only these representations are parameterized in a manner that is different from traditional linguistic models. Moreover, it is also crucial to recognize that the act of parametrizing language models itself is a form of theory-building in linguistics.
The idea of applying engineering principles to evaluate the practicality of established theories in the humanities field, such as linguistics, morality, psychology, and others, is an important one that warrants further exploration. In this regard, large language models are expected to play a significant role.
LLMs are poor theories?
However, there is also criticism of such an engineering-based theorisation approach. “While LLMs are successful as engineering tools, we saw that they are very poor theories of human linguistic cognition,” writes Prof Roni Katzir, a linguist at Tel Aviv University. Katzir’s critique is largely based on the current state of LLMs, which, according to him, remain to be “stochastic parrots”. As a result, while he acknowledges their ability to write entertaining poems and short stories, using them to understand the human faculty of language instead of doing actual linguistics is, to him, something entirely different.