At the grand unveiling of Google’s chatbot Bard in Paris on February 8, a demo video of the chatbot answering select questions was released. As fate would have it, Bard gave the wrong answer to a question about the James Webb Space Telescope. What’s more, Google didn’t even notice it before the release of the demo. It isn’t that the large language model (LLM) that Bard was trained on didn’t have this information, it was simply hallucinating.
The hallucination problem
A hallucinating model generates text that is factually incorrect, basically just spouting nonsense. But what is tricky about LLMs is that these facts are usually represented in a way that appears right, but isn’t. For most readers who usually tend to skim through the text, hallucinations can be hard to catch as the sentences always look right.
As sneaky as these hallucinations are, they are hard to get rid of. In the words of deep learning critic and Professor Emeritus of Psychology and Neural Science at NYU, Gary Marcus, “Hallucinations are in their (LLMs) silicon blood, a byproduct of the way they compress their inputs, losing track of factual relations in the process. To blithely assume that the problem will soon go away is to ignore 20 years of history.”
Can connecting LLMs to the web fix hallucinations?
With the quick-paced ascent of these chatbots, all their hallucinations have come to light too. And researchers are trying to get to their solution quicker than before. A Silicon Valley conversational AI startup ‘Got It AI’ is working to develop AI that will serve as a ‘truth-checker’ for enterprise applications like ChatGPT.
Recently, Josh Tobin, a former OpenAI researcher and the co-founder and CEO at Gantry, an AI startup developing platforms for AI engineering, listed out a simple method to reduce LLM hallucinations on LinkedIn:
1. Using retrieval-augmented models
2. Annotating the examples of hallucinations
3. Prompting or training a model that maps the context and the answer) -> p_halucinate
4. At test time, filtering responses using that model
Retrieval augmented language modelling or REALM has also been suggested by other experts as a redressal to the hallucination problem with LLMs. In REALM, the language models are trained on data fetched from external sources. For instance, if a user enters the prompt ‘Vincent Van Gogh was born in,’ a traditional LLM will try to complete the sentence by guessing the next token in that sequence. The model will likely give an accurate answer if trained on that dataset. Conversely, if not trained, the LLM will give a wrong answer.
On the other hand, REALM has a ‘knowledge retriever’ to search the document that will probably have the information relevant to the prompt. The model can then output Van Gogh’s birthplace from maybe a Wikipedia page and use this to generate a more reliable response. The knowledge retriever is also able to produce the references to the knowledge documents, which helps the user verify the source and accuracy of the text that the model generated.
When an LLM is connected to the internet, it starts training itself using retrieval augmented language modelling. This is exactly the progression that we have seen with the chatbots released by the two Big Tech giants heading the race – Microsoft’s Bing chatbot or Sydney is built on the ‘next-generation OpenAI LLM’ and is connected to the web while Google’s Bard is built on their LaMDA and is connected to their search engine.
A host of AI researchers resonate with this step. Joshua Levy, an AI author stated, “It looks very impressive. Adding web search makes research way more fluid. But the tricky part is how reliably it is combining facts and the LLM output. Have you fact-checked the citations and numbers?”
If connecting to the internet is the first step to removing hallucinations, then why is Bing’s chatbot throwing up the wildest answers? Instead, over the past few days, anybody who got the rare chance to use the chatbot shared instances of how the chatbot was unruly.
There’s another philosophical take that can be considered here. If our end goal is to build machines that are human-like, isn’t hallucinating and giving flawed answers a part of being human?
Shuhei Kurita, an NLP researcher with NYU went as far as to argue for hallucinations tweeting, “Google seems to try suppressing LLM hallucination, but isn’t hallucination a part of essential aspects of intelligence? Imagine writing novels by humans or playing with wild animals. They are parts of intelligence that aren’t directly relevant to living skills real-world.”