Listen to this story
Lately, LLMs have been quite the cultural phenomenon, especially with models like DALL-E, ChatGPT, or Copilot capturing the masses’ imagination, making AI an almost-household name. Let’s also add the infamous MetaAI’s Galactica to the list. But are ‘entertaining’ LLMs like GPT-3 and DALL-E a detour from AGI?
In an exclusive interview with Analytics India Magazine, Yoshua Bengio, one of the pioneers in deep learning, agreed with ‘the bigger, the better’ logic of large language models (LLMs), but noted that some important ingredients are still missing to achieve the kind of intelligence that humans have.
However, LLMs are not a new thing. They have been there for a while. For instance, Google’s search engine and email that we use and interact with daily is powered by Google’s LLM, BERT (Bidirectional Encoder Representations from Transformers). Similarly, Apple is deploying its Transformer models (the ‘T’ in GPT) on the Apple Neural Engine to facilitate various specific experiences, which include panoptic segmentation in camera, on-device scene analysis in photos, image captioning for accessibility, and machine translation, among others.
The fact that today LLMs are a household name can be credited to OpenAI, a non-profit organisation, which opened the models to the public to try and test, thereby raising the ‘cool quotient’ of these models, while also consequently improving them. In contrast, big-tech companies like Apple, Google, and Meta have been slyly integrating their language models into their own products and software applications. While the strategy did benefit them, OpenAI’s case, however, can be considered to be a classic example of building in public.
Looking carefully at Bengio’s comments, we can see that the idea is not to deprive them of having any use cases—we have already seen many products developed on OpenAI’s open-source API (such as GPT-3 for Jasper, Notion, or GPT-3.5 for Whatsapp integrations), or in some cases, OpenAI’s products directly integrated into the software as an offering (Dall-E for Shutterstock, Copilot for GitHub). Bengio instead raises issues with seeing LLMs as the path towards AGI (Artificial General Intelligence), or in simple terms, towards achieving ‘human-like’ intelligence.
Is scaling enough?
A simple case in point would be that an average five-year-old human processing ten images per second in about 100 milliseconds has only consumed enough data in their lifetime that Google, Instagram and YouTube produce in hours. However, they can reason sufficiently better than any AI has been able to do, even with 1:1000 of data required by LLMs. While the ‘text-to-anything’ applications have certainly given short-lived fame to language models, their future looks bleak since it is a ‘data-intensive’ task, and with the pace with which they are deployed, we might be approaching a place where our very source of data itself might end up being AI-produced (for instance, how ChatGPT produced results may populate the internet in near time).
As a result, some have even called for moving on from the term “Artificial Intelligence” and using something more appropriate, like, “cultural synthesiser” for systems like GPT-3 and Dall-E, which don’t use reason and higher-level conceptualisation.
An interesting caveat to this can be taken from the recent AGI and AI debate hosted by Montreal.AI, where in the Q&A segment DeepMind’s Dileep George was asked why he thinks we should stop and add more structure to the models when the current paradigm of scaling models through more parameters and more data is working perfectly fine. In response, George disagreed that the flaws of mere scaling are fewer. He added, “Systems are improving, but that’s a property of many algorithms. It’s a question of the scaling efficiency.” How do we scale data efficiently—such that better results can be obtained through much less data—is the challenge right now.
A general consensus that was built in the debate was that while the current lot of models are perfect black boxes, they lack crucial elements like cognition and understanding. But, on the outside, there are also detractors to this notion, like GoogleAI’s Blaise Agüera y Arcas, who believe that “statistics do amount to understanding”. According to Arcas, training models on complex sequence learning and social interaction are sufficient for general intelligence. It is an unsettled debate (and unlikely to settle) of what constitutes the “human mind”.
Several approaches to tackling the “cognition problem” have emerged in recent times. In the same debate, for instance, one of the panellists, Dave Ferrucci, founder of Elemental Cognition, said that they are pursuing a “hybrid” approach that uses language models to generate hypotheses as an “output” and then performing reasoning on top using “causal models”. Such an approach is developed with human-in-the-loop.
Adding to this, we must also note Ben Goertzel’s words, the head of the SingularityNET Foundation and the AGI Society, who believes that models like ChatGPT don’t know what they are talking about. Even if a fact checker is applied, it is still far from capable of generalisation like the human mind. According to him, the current deep learning programmes won’t make much progress to AGI and that looking at systems that “leap beyond their training” towards “open-ended growth” are now quite feasible. Thus, the idea of meta-learning, which Jürgen Schmidhuber describes as a general system that can “learn all of these things and, depending on the circumstances, and the environment, and on the objective function, it will invent learning algorithms that are properly suited for this type of problem and for this class of problems and so on”, is the way forward.
Therefore, the architectures underlying the more-famous models lack cognitive abilities. However, other mainstream approaches, such as OpenCog Hyperon, NARS, and SOFAI, are working in these areas, although they may seem less glamorous and exciting than models like ChatGPT or GPT-3.