MITB Banner

When (Not) to RAG

You will be shocked to know the information leaks that happen with this technique.

Share

When, and When Not to RAG
Listen to this story

What do OpenAI, Cohere, Anthropic; giants like Microsoft Azure, AWS, IBM Watsonx.ai; and the open source LangChain have in common? They all love to RAG. So, what’s the deal with RAG, and why is it gaining popularity so fast within enterprise?

RAG, or Retrieval-Augmented Generation, burst into the scene in 2020 when the brainiacs at Meta AI decided to jazz up the world of LLMs. It’s a game-changer. Designed to give LLMs the much-needed information techniques, RAG swooped in to fix the problems that haunted its predecessors – the dreaded hallucinations.

LLMs rely on statistical patterns without true comprehension. They’re excellent at generating text but struggle with logical reasoning, resulting in hallucinations. This is because LLMs, no matter how big the model size, or how long the context length, are still fixed to that information fed during training.

With RAG, customers can add another dataset, and give the LLM fresh information to generate the answer from. This is what enterprises need, to generate insights from their own data.

The safety issues

With the launch of GPT-4 Turbo and the Retrieval API, OpenAI has tried to fix the hallucination problem. With the long context length and the option for enterprises to integrate new data for information, OpenAI has almost cracked and solved the most important problem of LLMs, but forgot the data privacy of users.

For example, with a little fancier prompt engineering, a user on X was able to download the original knowledge files from someone else’s GPTs, an app built with the recently released GPT Builder, exactly with RAG. This is a big security issue for this model. 

If you give access to your documents to the AI model, someone can “convince” it to let them download the original files. Interestingly, Sam Altman at DevDay made no such announcement about this. Though the release blog conveniently says, “As with the rest of the platform, data and files passed to the OpenAI API are never used to train our models and developers can delete the data when they see fit.”

It seems as if the announcement of GPT Builder was just one more step for OpenAI to collect more data from the users, as long as they don’t delete it. Now that the company is also training GPT-5, it might make use of the files people upload and train on it. If it is just a bug, OpenAI should fix it immediately and make the original file inaccessible to the end user. 

Similarly, Google Bard also faced a similar prompt injection problem, where a hacker was able to exfiltrate files such as Docs, Drive, and YouTube history, from the chatbot that other users have uploaded. Even Google’s Bard is not foolproof.

Users on Reddit discuss if LangChain’s RAG offering would be better than using OpenAI’s. Currently, GPT Builder has a 20-file limit on its platform for building a single GPT, which makes it less desirable for serious developers. That is why a lot of developers prefer LangChain’s RAG offering, instead of OpenAI’s models.

Everyone RAGs differently

If you can ignore these security flaws with GPTs, it is still a viable user. But all of this should happen in a single prompt, which should ask the question and also ask the LLM to retrieve information from the specific dataset. And each company is focusing on solving a specific problem at the moment. 

For dynamic knowledge control, RAG lets you tweak and expand its internal knowledge without the hassle of retraining the entire model. This is mostly provided by open source companies such as LangChain by integrating them with a vector database, such as Pinecone, and using it with any open source LLM.

Every LLM builder does this by trying to expand the size of the model, or in the case of Anthropic, Bard, or Cohere, get the answers from the internet. This also allows them to generate current and reliable information for not relying on outdated facts. RAG ensures the LLM always has the latest and most trustworthy information at its disposal.

For ensuring domain specific knowledge, Cohere and Anthropic lets enterprises provide their own personal data through Oracle Cloud for expanding on the internal data. These LLMs, with RAG, provide insights that are more personalised company’s data.

This, in the end, definitely brings to question the announcement of the retrieval API by OpenAI. Though the price is reduced, other alternatives along with the open-source ones, make OpenAI’s closed-door unscalable. Though OpenAI is trying to introduce Long-Context RAG with an increased number of tokens in the hope that users wouldn’t want internet connection.

RAG stands out for its unique blend of benefits and cost-effectiveness. Its advantages include dynamic knowledge control, access to current and reliable information, transparent source verification, effective information leakage mitigation, domain-specific expertise, and low maintenance costs, among others. Choose wisely.

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.