AIM logo Black

Did Google Gemini 1.5 Really Kill RAG? 

“The worst take that I have seen these past few days is that long context models like Gemini 1.5 will replace RAG"

Share

Illustration by Diksha Mishra

Google’s recent release, Gemini 1.5, with a 1M context length window, has sparked a new debate about whether RAG (Retrieval Augmented Generation) is still relevant or not. LLMs commonly struggle with hallucination. To address this challenge, two solutions were introduced, one involving an increased context window and the other utilising RAG.

Lately, several developers have been experimenting with Gemini 1.5. “I uploaded the Great Gatsby with two alterations (mentioning an ‘iPhone-in-a-box’ and a ‘laser lawnmower’). Gemini nails it (& finds one more thing). Claude does but hallucinates. RAG doesn’t work,” Ethan Mollick, a professor at Wharton, wrote on X.

Another X user, Mckay Wrigley, fed an entire biology textbook into Gemini 1.5 Pro, which consisted of 491,002 tokens. He asked it three extremely specific questions, and it provided each answer 100% correctly. 

“Gemini 1.5 Pro is still underhyped. I uploaded an entire codebase directly from GitHub, and all of the issues, including Vercel AI SDK. Not only was it able to understand the codebase, but it also identified the most urgent issue and implemented a fix. This changes everything,” wrote Sully Omar, co-founder and chief at Cognosys. 

The three examples above prove that Gemini 1.5, with its extensive context window, successfully retrieves crucial information within the document. However, this doesn’t portray the limitation of RAG.

Comparing Apples and Oranges

Many are still confused about the distinction between RAG and the context window. The context window limits the model to information within a given text span, while RAG extends the model’s capabilities to external sources, vastly widening the scope of accessible information.

Taking notice of the hype on the internet, Oriol Vinyals, VP of Research and Deep Learning team lead at Google DeepMind, voiced his opinion, saying, “RAG (retrieval-augmented generation) isn’t done for, even though we can handle 1M or more tokens in context now. In fact, RAG has some nice properties that can enhance (and be enhanced by) long context.”

“RAG allows you to find relevant information, but the way the model accesses it may be too restrictive due to compression. Long context may help bridge that gap, similar to how L1/L2 cache & main memory work together in modern CPUs,” he added. 

A larger context window allows LLMs to consider more text and thus generates more accurate and coherent responses, particularly for complex and long sentences. However this doesn’t mean that the model won’t hallucinate. 

According to a paper titled ‘Lost in the Middle: How Language Models Use Long Contexts,’ published by researchers from Stanford University, UC Berkeley, and Samaya AI, LLMs exhibit high information retrieval accuracy at the document’s start and end. However, this accuracy declines in the middle, especially with increased input processing.

RAG Survives the Day 

“The worst take that I have seen these past few days is that long context models like Gemini 1.5 will replace RAG,” wrote Elvis Saravia, co-founder at DAIR.AI explaining  that long-context LLMs work great with static information (books, video recordings, PDFs, etc.) but they are yet to be battle-tested on highly evolving information and knowledge.

He further added that to tackle these types of problems, one could potentially combine RAG and long-context LLMs to build a robust system that effectively and efficiently retrieves and performs large-scale analysis of key historical information. 

“We will make progress towards addressing some of the challenges like “lost in the middle” and handling more complex structured and dynamic data but we still have a long way to go,” he said. Saravia added that different families of LLMs will help solve different types of problems. “We need to move on from this idea that there will be one LLM that will rule all.”

Without a doubt, Gemini 1.5 outperforms Claude 2.1 and GPT-4 Turbo as it can assimilate entire code bases, process over 100 papers, and various documents, but it surely hasn’t killed RAG.

Share
Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India