Last updated July 11, 2022
In AI News & Update

Meta AI launches an open-sourced model to make Wikipedia entries more accurate

This is a powerful example of machine learning tools that can help scale the work of volunteers by efficiently recommending citations and accurate sources.

Published on July 11, 2022
by Sri Krishna

Listen to this story

Meta AI has developed the first model capable of automatically verifying hundreds of thousands of citations. Trained over 134 million public web pages, the open-sourced model can check whether the citations support the corresponding claims.

It highlights questionable citations, allowing human editors to assess the cases that are most likely to be flawed without having to sift through thousands of properly cited statements. If a citation appears irrelevant, the model will recommend a more relevant source, even pointing to a specific passage that supports the claim.

“This is a powerful example of machine learning tools that can help scale the work of volunteers by efficiently recommending citations and accurate sources. Improving these processes will allow us to attract new editors to Wikipedia and provide better, more reliable information to billions of people around the world. I look forward to continued improvements in this area, especially as machine learning tools are able to provide more customized citations and multilingual options to serve our Wikimedia communities across more than 300 languages,” said Shani Evenstein Sigalov, a lecturer and researcher at Tel Aviv University, and Vice Chair of the Wikimedia Foundation’s Board of Trustees.

(1/2) Today, we're announcing the first model capable of automatically verifying hundreds of thousands of citations at once. Read more: https://t.co/c0nnvhuQx7 pic.twitter.com/UTSEYUmCRZ
— Meta AI (@MetaAI) July 11, 2022

Learning all of Wikipedia

In September 2020, Meta released an AI model that integrates information retrieval and verification. Since then, the company has been working on training neural networks to learn more nuanced representations of language so that they can find relevant source material in a pool of data the size of the internet.

Using natural language understanding (NLU) techniques, the system estimates the likelihood that a claim can be inferred from a source. To determine whether one statement supports or contradicts another, the models create and compare mathematical representations of the meanings of entire statements during a search.

The new dataset of 134 million web pages serves as one of the system’s main components: Sphere, an open-sourced web-scale retrieval library. Meta has fed the algorithms 4 million Wikipedia claims, teaching them to point out a single source from a vast pool of web pages to validate each statement. Because webpages can contain long stretches of text, the models evaluate content in chunks and take only the most relevant passage into account when deciding whether to recommend a URL. These prebuilt indices, which catalogue 40 times more content than other Wikipedia indices, will be included with Sphere.

The indices route potential sources through an evidence-ranking model that compares the new text to the original citation. The model ranks the cited source and the retrieved alternatives based on the likelihood that they support the claim using fine-grained language comprehension. In the real world, the model will recommend the most relevant URLs as prospective citations for a human editor to review and approve.

Making sense of the real world

Meta’s ultimate goal is to create a platform that will assist Wikipedia editors in systematically identifying citation issues and quickly fixing the citation or correcting the content of the corresponding article at scale.

This model could also guide the way to better results on many other tasks, such as classic natural language inference, retrieval in question-answering systems, and few-shot learning.

Access all our open Survey & Awards Nomination forms in one place >>

Sri Krishna

Sri Krishna is a technology enthusiast with a professional background in journalism. He believes in writing on subjects that evoke a thought process towards a better world. When not writing, he indulges his passion for automobiles and poetry.

Watch More

Meta AI launches an open-sourced model to make Wikipedia entries more accurate

Learning all of Wikipedia

Making sense of the real world

Sri Krishna

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.