Listen to this story
Hallucinations have plagued LLMs ever since their inception, fuelling concerns over their capabilities to produce believable misinformation. While top AI think tanks have tried a variety of methods to reduce hallucination in their LLMs, the reality of the matter is that hallucinations are an inescapable part of language models due to their architecture.
While there is no way to directly curtail LLMs from within, there may be an architectural solution to this deep-rooted problem. Vector databases, which have seen an explosion in the AI wave, might just be the secret weapon we need to stop LLMs from hallucinating.
Using a technique called Vector SQL, a company called MyScale has created an architecture where LLMs can query vector databases instead of trying to generate the answers to user queries by themselves. While this method relegates LLMs to a part of a larger data recovery mechanism, it is shown to reduce hallucinations and make LLMs suitable for widespread use.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Vector SQL explained
To understand why vector SQL is so effective in curtailing hallucinations, we must first understand why LLMs hallucinate. The main method LLMs generate text is by statistically predicting the next token, making up words and sentences by drawing on its training data. Due to the statistical probability of some words appearing after another in the training data, LLMs commonly generate misinformation which is presented in a believable way.
“A vector database enriches the language model’s training process and enhances its capacity to generate factually accurate and contextually appropriate responses. By leveraging the knowledge within the vector database, we can significantly mitigate the risk of hallucination and enhance the reliability of AI-generated content,” Himanshumali, Solutions Architect Leader, MongoDB, told AIM.
By using vector databases, LLMs can query an index of human-written content that can help it back up its statements. In this situation, instead of generating the answer from its own training data, the LLM will query the database for the information, providing a better solution than just text generation which is prone to hallucination. While this method does require filters so that the model doesn’t get ‘confused’, it is still a better method than raw text generation.
The next step in vector databases is automation through SQL code. By harnessing the code generation capabilities of LLMs, we can make them write SQL queries for the natural language queries of the user. This can then be passed on to a vector SQL engine, which converts between the two database types. The knowledge is then passed back onto the LLM, which repackages the data into a human-readable format and presents it to the user.
Vector SQL comes with a host of benefits such as improved efficiency, better flexibility for function support, and all the benefits of SQL. Due to the pervasive nature of SQL code in their training data, LLMs can easily generate SQL code, and database solutions like PostgreSQL and Clickhouse have already integrated vector search functionality for use with AI. Even if LLMs haven’t been trained on SQL data, it is possible to engineer an LLM through prompts to construct vector SQL queries. This makes the vector SQL method compatible with even off-the-shelf LLMs.
While this specific approach is certainly a new way of giving LLMs access to human-readable data, many of today’s top chatbots already use similar methods to access data. While these models have not gotten rid of hallucinations completely, architecturally sound solutions have been instrumental in reducing the rate of hallucinations.
The future of accurate LLMs?
“We think the two things developers should look for in a vector database is one that is directly integrated with your operational database and, ideally, an operational database that has a highly flexible and scalable document-based data model. This allows all the data to be stored together, making it simpler, quicker, and more efficient to manage,” added Himanshumali.
Certain implementations, like Microsoft’s Bing Chat, were made with a specific focus to be a natural language interface for web search, instead of being a standalone LLM. This meant that the solution’s first priority was to provide search results to users. Microsoft achieved this by creating a system known as Prometheus. While not much is known about the inner workings of Prometheus, Microsoft stated, “[Prometheus] is a first-of-its-kind AI model that combines the fresh and comprehensive Bing index, ranking, and answers results with the creative reasoning capabilities of OpenAI’s most-advanced GPT models.”
Elaborating further in a blog, Jordi Ribas, corporate vice president for Search and AI at Microsoft, said that Prometheus uses the GPT model to ‘generate a set of internal queries’ by using something he called the Bing Orchestrator. These queries allow the answers to stay relevant to the query while harnessing the latest data through the Bing search engine. This method is called grounding, and aims to reduce inaccuracies by keeping the model fed with relevant and fresh information, thus reducing the potential for hallucinations. Prometheus also goes the extra mile and includes linked citations for each point it makes, giving the users even more confidence in the bot’s answers.
While it is likely that the Prometheus system uses vector databases in some way to achieve this, Microsoft has not shed any light on the matter. However, with the rise of vector SQL and other similar architectural solutions, the age of hallucination free LLMs might be upon us.
[This article was updated on 28th July with inputs from MongoDB.]