Meta AI launches an open-sourced model to make Wikipedia entries more accurate

This is a powerful example of machine learning tools that can help scale the work of volunteers by efficiently recommending citations and accurate sources.
meta translation model
Listen to this story

Meta AI has developed the first model capable of automatically verifying hundreds of thousands of citations. Trained over 134 million public web pages, the open-sourced model can check whether the citations support the corresponding claims.

It highlights questionable citations, allowing human editors to assess the cases that are most likely to be flawed without having to sift through thousands of properly cited statements. If a citation appears irrelevant, the model will recommend a more relevant source, even pointing to a specific passage that supports the claim.

“This is a powerful example of machine learning tools that can help scale the work of volunteers by efficiently recommending citations and accurate sources. Improving these processes will allow us to attract new editors to Wikipedia and provide better, more reliable information to billions of people around the world. I look forward to continued improvements in this area, especially as machine learning tools are able to provide more customized citations and multilingual options to serve our Wikimedia communities across more than 300 languages,” said Shani Evenstein Sigalov, a lecturer and researcher at Tel Aviv University, and Vice Chair of the Wikimedia Foundation’s Board of Trustees.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Learning all of Wikipedia 

In September 2020, Meta released an AI model that integrates information retrieval and verification. Since then, the company has been working on training neural networks to learn more nuanced representations of language so that they can find relevant source material in a pool of data the size of the internet.

Using natural language understanding (NLU) techniques, the system estimates the likelihood that a claim can be inferred from a source. To determine whether one statement supports or contradicts another, the models create and compare mathematical representations of the meanings of entire statements during a search.

The new dataset of 134 million web pages serves as one of the system’s main components: Sphere, an open-sourced web-scale retrieval library. Meta has fed the algorithms 4 million Wikipedia claims, teaching them to point out a single source from a vast pool of web pages to validate each statement. Because webpages can contain long stretches of text, the models evaluate content in chunks and take only the most relevant passage into account when deciding whether to recommend a URL. These prebuilt indices, which catalogue 40 times more content than other Wikipedia indices, will be included with Sphere.

The indices route potential sources through an evidence-ranking model that compares the new text to the original citation. The model ranks the cited source and the retrieved alternatives based on the likelihood that they support the claim using fine-grained language comprehension. In the real world, the model will recommend the most relevant URLs as prospective citations for a human editor to review and approve.

Making sense of the real world

Meta’s ultimate goal is to create a platform that will assist Wikipedia editors in systematically identifying citation issues and quickly fixing the citation or correcting the content of the corresponding article at scale.

This model could also guide the way to better results on many other tasks, such as classic natural language inference, retrieval in question-answering systems, and few-shot learning.

More Great AIM Stories

Sri Krishna
Sri Krishna is a technology enthusiast with a professional background in journalism. He believes in writing on subjects that evoke a thought process towards a better world. When not writing, he indulges his passion for automobiles and poetry.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM