Why Meta Took Down its ‘Hallucinating’ AI Model Galactica?

“The reality is that large language models like GPT-3 and Galactica are like bulls in a china shop, powerful but reckless”
Listen to this story

On Wednesday, MetaAI and Papers with Code announced the release of Galactica, an open-source large language model trained on scientific knowledge, with 120 billion parameters. However, just days after its launch, Meta took Galactica down.

Interestingly, every result generated by Galactica came with the warning- Outputs may be unreliable. Language Models are prone to hallucinate text.

“Galactica is trained on a large and curated corpus of humanity’s scientific knowledge. This includes over 48 million papers, textbooks and lecture notes, millions of compounds and proteins, scientific websites, encyclopedias and more,” the paper said.

Galactica was designed to tackle the issue of information overload when accessing scientific information through search engines, where there is no proper organisation of scientific knowledge.

However, when members of the community started using the all new AI model by Meta, many of them found the results to be suspicious. In fact, many took to Twitter to point out that the results presented by Galactica were actually highly inaccurate.

Alex Polozov, staff research scientist at Google, called Galactica an endless source of adversarial examples for hallucination, attribution, and alignment research.

False Results

“I asked Galactica about some things I know about and I’m troubled. In all cases, it was wrong or biassed but sounded right and authoritative. I think it’s dangerous,” Micheal Black, director at Max Planck Institute for Intelligent Systems, said.

Gary Marcus, a Professor of Psychology and Neural Science at NYU, is a popular critic of deep learning and AGI also took to Twitter to state that Galactica got his birthday, education as well as research interests wrong. Nearly 85% of the results presented by Galactica about Marcus were not true, according to him.

Tariq Desai, head of Data Science at ExploreAI, told AIM that he was in fact genuinely excited to try out Galactica because it seemed like a valuable way to search and synthesise scientific knowledge. “However, the few examples that I did try suggested that the model was better at mimicking the form of scientific writing than in reproducing its semantic content. For example, I prompted the model for a ‘literature review on whether HIV causes AIDS’ and was presented with text which was just wrong on this question, and which invented citations and research.

“Galactica was useful for exploring mathematical content, though, and demonstrates the potential of some interesting applications in that sphere,” Desai added.

Interestingly, the paper stated that Galactica beats GPT-3, one of the most popular large language models, by 68.2% versus 49.0% on technical knowledge probes such as LaTeX equations.

Inaccurate results could be dangerous 

Julian Togelius, associate professor at NYU, also pointed out that Galactica not only got his name wrong, but failed to summarise his work. “Asking Galactica to summarise my work gives results that vary from hilariously wrong to actually mostly correct.”

He also pointed out that while it was easy for him to figure out the difference, it might not be the same for someone who does not know him personally.

(Source: Twitter)

Even though some of the results are hysterical, inaccurate or falsely generated results could prove to be problematic because they could be perceived to be correct by other members of the community, and it could prove to be highly dangerous in terms of scientific research.

In this regard, Black said that Galactica generates text that’s grammatically correct and feels real. “This text will slip into real scientific submissions. It will be realistic but wrong or biassed and  hard to detect. It will influence how people think,” he said.

“It offers authoritative-sounding science that isn’t grounded in the scientific method. It produces pseudoscience based on statistical properties of science writing. Grammatical science writing is not the same as doing science. But it will be hard to distinguish,” he added.

Explaining further, Black said a pandora box has been opened and that there is a possibility for deep scientific fakes. Researcher’s names could be cited on papers they did not write. Further, these papers will be then cited by other researchers in real papers. “What a mess this will be,” Black said.

Marcus also concurs Black’s views on Galactica and how such models could prove to be dangerous. “The reality is that large language models like GPT-3 and Galactica are like bulls in a china shop, powerful but reckless. And they are likely to vastly increase the challenge of misinformation,” he said.

Download our Mobile App

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox