MITB Banner

RAG is Just Fancier Prompt Engineering

It is time to build a base set of prompts for RAG systems.

Share

RAG
Listen to this story

Everyday there is a new acronym popping up in the generative AI world. One of the latest buzzwords is RAG, which stands for retrieval augmented generation. And it is not just another acronym; it represents a significant leap in the field of LLMs. But what exactly is it?

RAG has gained popularity because it combines the strengths of both retrieval-based and generative-based models. Basically, RAG is about attaching a new database onto a base model, and letting the model retrieve new information from it, and then generate information from it.

This helps in reducing the hallucination in the model. Mostly, it is a vector database, or in certain cases such as GPT-4, it is the internet. 

At Cypher 2023, Dhruv Motwani, founder & CEO at SpringtownAI, also discussed and demonstrated the use of RAG and explored its architecture and functionality. The participants were able to deploy applications on their AWS accounts and test the hallucinations of the models, which were lesser when using RAG.

Is it really any good?

“The RAG that you see today is really glorified prompt engineering. The most common standard RAG flow we currently have is completely unaware of the context of your data. Without looking up in your data, it sends the lookup plus the original query to GPT,” Mark McQuade, co-founder of Arcee.ai told AIM.

McQuade and his team have built an End2End RAG system called DALM, which sits on top of the main LLM. He said that the best way to use it is to pair it with an in-domain specialised model with the system, instead of a larger model with tons of unnecessary data. 

Before delving into RAG, it’s essential to understand the choices that AI developers have when working with AI models. They can either build a model from scratch, fine-tune an existing model, or employ retrieval augmented generation. Each approach has its pros and cons, and the bigger the model, the bigger are the chances of hallucinations. 

Building from the ground up can be a costly and time-consuming endeavour. For instance, OpenAI invested over $100 million to train its GPT-4 model. On the other hand, fine-tuning existing models with additional data is a viable option, but it carries the risk of the model “forgetting” some of its original training data.

RAG combines retrieval-based and generative-based AI models to deliver context-aware responses. A retrieval model is used to access information from existing knowledge sources, such as databases or online articles. The generative model then takes this retrieved information and synthesises it into coherent, contextually appropriate responses.

The key advantage of RAG is its ability to provide responses that are not only accurate but also unique, akin to human language, rather than simply summarising retrieved data.

At its core, RAG is essentially an advanced form of prompt engineering. It focuses on keeping the model fixed and optimising how it “stuffs” the context window with text to answer specific questions. This approach is particularly beneficial for prompt engineers who need to learn AI engineering skills and master a base set of prompts to build RAG/agent systems.

In more complex RAG/agent systems, it’s not just about a single prompt; it involves a collection of prompts that work in harmony to provide accurate and context-aware responses.

RAG extended

Some researchers argue that RAG might not be as beneficial as compared to a longer context window, as both of these offer the same results. A recent study titled ‘Retrieval meets Long Context Large Language Models‘ compared RAG with longer context window LLMs. 

https://twitter.com/rohanpaul_ai/status/1710641374385594482

The paper found out that open source Embedding Models/Retrievers outperformed OpenAI models. Combining a simple RAG with a 4k LLM could match the performance of long context LLM. Moreover, RAG paired with a 32k LLM outperformed providing the full context.

While RAG offers significant benefits, it’s important to consider the potential for bad responses when retrieving information. Philipp Schmid, tech lead at Hugging Face questioned if we teach LLMs to be more factually correct and self-reliable and introduced Self-RAG. It is a novel approach of teaching models when to retrieve information and how to use it effectively. 

Self-RAG involves creating a “critique” dataset to determine when retrieval is appropriate and what information is relevant. By creating a “critique” dataset with retrieval guidelines, developers can then train the critique model on the synthetic dataset. Using prompts, the critique model, and a retriever, the developer can generate a RAG dataset.

After training the LLM on the RAG dataset, including special tokens to instruct the model on when to retrieve or generate responses, the model during inference adaptively generates special tokens based on the query to determine whether retrieval is necessary. 

It seems like the hallucinations will continue for a while. But just like prompt engineering, it is time to learn the next AI engineering skill, and catch up with the buzzword, and this time it is to build a base set of prompts for RAG systems. 

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India