MITB Banner

Busting the Myth of Context Length

In the push for making chatbots as smart as humans, we are definitely also making them as dumb as humans

Share

NVIDIA Introduces Ruler to Measure the Context Length of Models
Listen to this story

Now that we have smaller models such as LLaMA and Falcon that are performing similar to GPT-4 or PaLM in certain cases, the conversation has shifted from increasing the number of parameters to the number of context tokens or context length in these models. 

In essence, context length is the ability of an LLM to respond to a prompt perfectly, as it needs clarity of the entire context that the question has been put in. 

Often, people have this notion that when the input word count is more, the output would eventually be perfect. But, in reality, that is not the case. Say, you input an article of 2000 words on ChatGPT, it starts to make sense of it till it reaches a 700-800 word mark, then starts hallucinating. That’s the truth. 

This is pretty much similar to how memory or short term memory works for humans. But is it really the case that context length is all that matters? 

Attention is indeed all you need

Take listening to a story or watching a movie, for example. In most cases, the introductory part and the ending is what the audience remembers the most and the part in the middle often has the least recall value. Jim Fan from NVIDIA AI and a Stanford PhD holder explains that this is exactly what LLMs are going through. 

In his tweet, taking basis from the recent paper from Stanford researchers — Lost in the Middle: How Language Models Use Long Contexts — Fan explains how claims of a million or billion tokens are not helpful when it comes to improving LLMs. “What truly matters is how well the model actually uses the context. It’s easy to make seemingly wild claims, but much harder to solve real problems better,” he said. 

The paper explains how models are good at retaining the information present in the beginning and the end of the context, but not what is present in the middle. This is the same with all LLMs that are being currently developed including GPT, PaLM, or Flan-T5. 

Moreover, models that have a natively longer context also fail to actually use the context better. In the paper, the researchers demonstrate how both the versions of GPT-3.5, one with 4k and the other with 16k context length, demonstrate similar results and the performance decreases as the context grows longer. 

Ahmed Moubtahij from Computer Research Institute of Montreal adds that this might possibly be because of the training examples and the issue with input data. Most of these models are trained on internet data with pages such as news articles that have the most important information at the beginning and at the end. This results in the output of LLMs also presenting the same architecture. 

Stupidity like humans

Ever since Transformers was introduced with the Attention is All You Need paper, context length has been discussed excessively in every LLM release. It has always been believed that increasing the sequence length would improve the accuracy of the models. But indeed, just like humans forget half the story midway, LLMs are showcasing similar capabilities or possibly inabilities.

One thing that is certain is that in the push for making chatbots as smart as humans, we are definitely able to make them as dumb as humans. Maybe that is all we need even if we don’t want to. The similarity between human brains and Transformers is astonishing.

In discussions on HackerNews, Reddit, and Twitter on the same topic, users shared how increasing the number of tokens is becoming laughable at this point. “I’ve noticed this with GPT-4. It’ll ignore some part of its context, and when I point it out, it knows, so it’s clearly still in its context, but it didn’t know it has to look it up for a particular answer. We also have the same problem with memory, so I empathise.”

Moreover, if LLM providers through APIs are charging dollars per token, increasing the number of context tokens just to earn more money only makes more sense for them. More research would definitely prove if it does make sense to add more context tokens. 

The giant costs of tokenisation in transformers makes one question if the money will eventually even be worth it. Anthropic’s Claude, which has the highest token count of 100k, will possibly be very costly if we take for example, GPT-4’s 32k context length cost of USD 1.96 per token.

For now, LLMs, like us humans, have a curious habit of remembering the story’s beginning and end with flair while casually dismissing the messy middle part. These models exhibit a common tendency — the longer the context, the higher the likelihood of their stumbling. It’s almost as if they suffer from a case of “attention deficit context disorder”.

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.