Listen to this story
|
What do OpenAI, Cohere, Anthropic; giants like Microsoft Azure, AWS, IBM Watsonx.ai; and the open source LangChain have in common? They all love to RAG. So, what’s the deal with RAG, and why is it gaining popularity so fast within enterprise?
RAG, or Retrieval-Augmented Generation, burst into the scene in 2020 when the brainiacs at Meta AI decided to jazz up the world of LLMs. It’s a game-changer. Designed to give LLMs the much-needed information techniques, RAG swooped in to fix the problems that haunted its predecessors – the dreaded hallucinations.
LLMs rely on statistical patterns without true comprehension. They’re excellent at generating text but struggle with logical reasoning, resulting in hallucinations. This is because LLMs, no matter how big the model size, or how long the context length, are still fixed to that information fed during training.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
With RAG, customers can add another dataset, and give the LLM fresh information to generate the answer from. This is what enterprises need, to generate insights from their own data.

The safety issues
With the launch of GPT-4 Turbo and the Retrieval API, OpenAI has tried to fix the hallucination problem. With the long context length and the option for enterprises to integrate new data for information, OpenAI has almost cracked and solved the most important problem of LLMs, but forgot the data privacy of users.
For example, with a little fancier prompt engineering, a user on X was able to download the original knowledge files from someone else’s GPTs, an app built with the recently released GPT Builder, exactly with RAG. This is a big security issue for this model.
Oh man — you can just download the knowledge files (RAG) from GPTs. I don't know if this is a security leak or "just" a prompt engineering? @OpenAI @simonw https://t.co/VKMW8s4vfb pic.twitter.com/S1RYREna9b
— Kanat Bekt (@kanateven) November 9, 2023
If you give access to your documents to the AI model, someone can “convince” it to let them download the original files. Interestingly, Sam Altman at DevDay made no such announcement about this. Though the release blog conveniently says, “As with the rest of the platform, data and files passed to the OpenAI API are never used to train our models and developers can delete the data when they see fit.”
It seems as if the announcement of GPT Builder was just one more step for OpenAI to collect more data from the users, as long as they don’t delete it. Now that the company is also training GPT-5, it might make use of the files people upload and train on it. If it is just a bug, OpenAI should fix it immediately and make the original file inaccessible to the end user.
Similarly, Google Bard also faced a similar prompt injection problem, where a hacker was able to exfiltrate files such as Docs, Drive, and YouTube history, from the chatbot that other users have uploaded. Even Google’s Bard is not foolproof.
Users on Reddit discuss if LangChain’s RAG offering would be better than using OpenAI’s. Currently, GPT Builder has a 20-file limit on its platform for building a single GPT, which makes it less desirable for serious developers. That is why a lot of developers prefer LangChain’s RAG offering, instead of OpenAI’s models.
Everyone RAGs differently
If you can ignore these security flaws with GPTs, it is still a viable user. But all of this should happen in a single prompt, which should ask the question and also ask the LLM to retrieve information from the specific dataset. And each company is focusing on solving a specific problem at the moment.
For dynamic knowledge control, RAG lets you tweak and expand its internal knowledge without the hassle of retraining the entire model. This is mostly provided by open source companies such as LangChain by integrating them with a vector database, such as Pinecone, and using it with any open source LLM.
Today, I learned how to code my own Retrieval Augmented Generation (RAG)
— Sébastien ☁ Stormacq 🇺🇦 (@sebsto) November 7, 2023
I just took me a couple of hours.
Without RAG, I received an hallucination. With RAG the response is short but correct.#aws #bedrock #llm #ec2mac #python pic.twitter.com/sgRZxQPj8B
Every LLM builder does this by trying to expand the size of the model, or in the case of Anthropic, Bard, or Cohere, get the answers from the internet. This also allows them to generate current and reliable information for not relying on outdated facts. RAG ensures the LLM always has the latest and most trustworthy information at its disposal.
For ensuring domain specific knowledge, Cohere and Anthropic lets enterprises provide their own personal data through Oracle Cloud for expanding on the internal data. These LLMs, with RAG, provide insights that are more personalised company’s data.
This, in the end, definitely brings to question the announcement of the retrieval API by OpenAI. Though the price is reduced, other alternatives along with the open-source ones, make OpenAI’s closed-door unscalable. Though OpenAI is trying to introduce Long-Context RAG with an increased number of tokens in the hope that users wouldn’t want internet connection.
GPT-4-turbo can now process 128K input tokens! This is the next generation of RAG: Long-Context RAG 🚀
— Erika Cardenas (@ecardenas300) November 13, 2023
Longer context windows have the potential to overcome the limitations of search. Search is frequently evaluated based on recall of the top K results. A smaller value of k (1 to… pic.twitter.com/8nu9OW4naH
RAG stands out for its unique blend of benefits and cost-effectiveness. Its advantages include dynamic knowledge control, access to current and reliable information, transparent source verification, effective information leakage mitigation, domain-specific expertise, and low maintenance costs, among others. Choose wisely.