Google’s recent release, Gemini 1.5, with a 1M context length window, has sparked a new debate about whether RAG (Retrieval Augmented Generation) is still relevant or not. LLMs commonly struggle with hallucination. To address this challenge, two solutions were introduced, one involving an increased context window and the other utilising RAG.
Lately, several developers have been experimenting with Gemini 1.5. “I uploaded the Great Gatsby with two alterations (mentioning an ‘iPhone-in-a-box’ and a ‘laser lawnmower’). Gemini nails it (& finds one more thing). Claude does but hallucinates. RAG doesn’t work,” Ethan Mollick, a professor at Wharton, wrote on X.
Another X user, Mckay Wrigley, fed an entire biology textbook into Gemini 1.5 Pro, which consisted of 491,002 tokens. He asked it three extremely specific questions, and it provided each answer 100% correctly.
“Gemini 1.5 Pro is still underhyped. I uploaded an entire codebase directly from GitHub, and all of the issues, including Vercel AI SDK. Not only was it able to understand the codebase, but it also identified the most urgent issue and implemented a fix. This changes everything,” wrote Sully Omar, co-founder and chief at Cognosys.
The three examples above prove that Gemini 1.5, with its extensive context window, successfully retrieves crucial information within the document. However, this doesn’t portray the limitation of RAG.
Comparing Apples and Oranges
Many are still confused about the distinction between RAG and the context window. The context window limits the model to information within a given text span, while RAG extends the model’s capabilities to external sources, vastly widening the scope of accessible information.
Taking notice of the hype on the internet, Oriol Vinyals, VP of Research and Deep Learning team lead at Google DeepMind, voiced his opinion, saying, “RAG (retrieval-augmented generation) isn’t done for, even though we can handle 1M or more tokens in context now. In fact, RAG has some nice properties that can enhance (and be enhanced by) long context.”
“RAG allows you to find relevant information, but the way the model accesses it may be too restrictive due to compression. Long context may help bridge that gap, similar to how L1/L2 cache & main memory work together in modern CPUs,” he added.
A larger context window allows LLMs to consider more text and thus generates more accurate and coherent responses, particularly for complex and long sentences. However this doesn’t mean that the model won’t hallucinate.
According to a paper titled ‘Lost in the Middle: How Language Models Use Long Contexts,’ published by researchers from Stanford University, UC Berkeley, and Samaya AI, LLMs exhibit high information retrieval accuracy at the document’s start and end. However, this accuracy declines in the middle, especially with increased input processing.
RAG Survives the Day
“The worst take that I have seen these past few days is that long context models like Gemini 1.5 will replace RAG,” wrote Elvis Saravia, co-founder at DAIR.AI explaining that long-context LLMs work great with static information (books, video recordings, PDFs, etc.) but they are yet to be battle-tested on highly evolving information and knowledge.
He further added that to tackle these types of problems, one could potentially combine RAG and long-context LLMs to build a robust system that effectively and efficiently retrieves and performs large-scale analysis of key historical information.
“We will make progress towards addressing some of the challenges like “lost in the middle” and handling more complex structured and dynamic data but we still have a long way to go,” he said. Saravia added that different families of LLMs will help solve different types of problems. “We need to move on from this idea that there will be one LLM that will rule all.”
Without a doubt, Gemini 1.5 outperforms Claude 2.1 and GPT-4 Turbo as it can assimilate entire code bases, process over 100 papers, and various documents, but it surely hasn’t killed RAG.