Meta Takes on Google, Open-Sources White-box Search Solution Sphere

Sphere represents Meta’s effort to enable AI researchers to experiment with building KI-NLP models

Share

Published on August 11, 2022

by Zinnia Banerjee

Listen to this story

Ever wondered how Wikipedia or Google Search answers our questions? Random thoughts strike us at random moments, and we turn to these search engines and online encyclopaedias. Either way, we get our answers. Mostly.

Knowledge-intensive natural language processing (KI-NLP) is how Google Search or Wikipedia fetches answers to our questions. AI models in them dig through an archive of information to give us relevant search results. However, there are several limitations to the current KI-NLP landscape.

KI-NLP architectures depend on black-box search engines to hunt for information from the knowledge web. In the process, relevant information can be missed as the search engine algorithms can rank it too low in the results. Also, in the case of Wikipedia searches, often, the online encyclopedia doesn’t capture all the knowledge available on the web related to a particular topic, and with its continuous growth, it has become a challenge to verify citations and other biases.

Meta’s Sphere

Meta came up with the first white-box search solution, Sphere, which uses open web data as the source of knowledge. Meta believes that Sphere’s white-box knowledge base has significantly more data and sources to match for verification than a typical black-box knowledge source. Thus, it can provide useful information that they cannot.

The idea was to create more intelligent AI systems that could leverage real-world knowledge in a better manner. Sphere has surpassed the Knowledge Intensive Language Tasks benchmark, implying it can help AI researchers to build models that can leverage real-world knowledge to accomplish multiple tasks.

Sphere represents the effort of Meta to enable AI researchers to experiment with building KI-NLP models. Meta believes Sphere will help researchers train retrievers to handle a wider range of documents and prepare automatic systems to deal with issues like misinformation and incoherent text. The models thus created could help in the real world to tackle harmful content. It also holds the potential to enhance digital literacy and critical thinking skills.

How Meta seeks to challenge Google with Sphere

Shortly after Meta released Sphere, discussions started doing the rounds that Meta is seeking to challenge Google.

“MetaAI introduces white-box search. By open-sourcing Sphere, its web scale corpus. Directly challenges Google,” posted Prithivi Damodaran, an ML consultant at Donkey Stereotype.

With Sphere, Meta is trying to address the problem of the most-relevant source related to the surfer’s query or topic. At a time when search engine optimisation is widely used to rank information resources easily, what appears higher in search results might not be the most relevant source for the surfer. In fact, Google Search is notorious for its search results. There have been numerous complaints by users about Google Search results being wrong and irrelevant. Many times, the initial results pertain to ads or information not even remotely related to the query. Sphere seeks to solve this issue.

Another way Meta is trying to outdo Google with Sphere is by open sourcing it. Big Tech companies like Google have been often criticised for being opaque with their ML research. They do not give out any information on how such models were created or what data was applied leading to the AI replication crisis.

Replication crisis is indeed a big issue as it can lead to several other problems. If an AI research team does not put out any information regarding its AI model, the larger community doesn’t get to know if it is using a biased dataset to train the model. Suddenly, it produces biased results when introduced to the real world. Take the case of Google Vision Cloud, which labelled the image of a dark-skinned individual holding a thermometer as a “gun” while a similar image with a light-skinned individual was as an “electronic device”.

Future trajectory

Whether Sphere pans out the way Meta wants it to, is a matter of time. However, Meta’s work on a web-scale corpus like Sphere shows the potential that harnessing the vast textual resources available online today through white-box retrieval may be the next big breakthrough in NLP.

Nonetheless, problems exist. One of the key problems that Meta plans to address is with regard to the quality of retrieved information. NLP models should be able to assess the quality of the retrieved documents, handle duplicates, detect potential false claims and contradictions, prioritise more trustworthy sources and refrain from providing the answer if no sufficiently good evidence exists in the corpus.

Access all our open Survey & Awards Nomination forms in one place