MITB Banner

Google’s New AI Model Can Access Distinct Memories Of Entities With Lesser Data

Share

Researchers at tech giant Google introduced a new AI model recently known as Entities as Experts (EAE). Focusing on the problem of capturing declarative knowledge in the learned parameters of a language model, this new model has the capability to access distinct memories of the entities mentioned in a piece of text.

One of the popular components in text understanding is neural network sequence models which are pre-trained as language models. It has been suggested that neural language models could take the place of curated knowledge bases or textual corpora for tasks such as question-answering.

Behind Entities as Experts (EAE)

The researchers focussed on developing neural sequence models that capture the knowledge required to answer questions. The researchers trained this model to predict masked outspans in English Wikipedia text. They also train the EAE model to access memories for entity mentions only, and to access the correct memories for each entity mentioned. By associating memories with specific entities, EAE can learn to access them sparsely.

How EAE Works 

The basic model architecture follows the Transformer interleaved with the entity memory layer. The EAE model has two embedding matrices, which are token and entity embeddings. The researchers evaluated the ability of EAE to capture declarative knowledge using the cloze knowledge probes and the TriviaQA question answering task.

In the above figure, of the EAE model, the initial transformer layer output is used to predict mention boundaries, to retrieve entity embeddings from entity memory, and to construct input to the next transformer layer, augmented with the retrieved entity embeddings of the previously retrieved entity embeddings from memory.

The final transformer layer can be connected to multiple task-specific heads. In this model, the final transformer block output is connected to task-specific heads, which are token prediction and entity prediction. The token prediction head predicts masked tokens for a cloze task and the entity prediction head predicts entity id for each entity mention span, i.e., entity linking. 

In cloze tests, a model must recover the words in a blanked out mention by correctly associating

the mention with its surrounding sentence context. The entity retrieval, after the first transformer layer mentioned above, is then supervised with an entity linking objective during pre-training.

In this model, the entity memory layer is closely tied to memory-based neural layers, which can be seen as a memory network where memory access is supervised through entity linking, and each memory slot corresponds to a learned entity representation.

How It Is Different From Other Models

According to the researchers, unlike previous efforts to integrate entity knowledge into sequence models, this AI model does not rely on an external knowledge base for its entity representations. Instead, it learns them directly from the text, along with all the other model parameters.

Dataset Used

The researchers build the training corpus of contexts paired with entity mention labels

from the dump of English Wikipedia. For evaluation of the EAE model, the researchers used TriviaQA dataset, which is a large scale dataset that contains question-answer pairs. The dataset is a reading comprehension dataset containing over 650K question-answer-evidence triples.

The researchers claimed that they use a lesser amount of data to train and test the model. The Wikipedia data is pre-processed by removing 20% of randomly chosen entity mentions. The researchers further created development and test sets that have the same form as the training data.

Wrapping Up

According to the researchers, only accessing a small proportion of its parameters at inference time, this AI model outperforms a much larger model on TriviaQA. The researchers claimed that Entities as Experts (EAE) outperforms a Transformer model with 30× the parameters and contains more factual knowledge than a similar-sized BERT, according to the Language Model Analysis (LAMA) knowledge probe.

Read the paper here.

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.