Last updated April 23, 2020

Google’s New AI Model Can Access Distinct Memories Of Entities With Lesser Data

Share

Published on April 26, 2020

by Ambika Choudhury

Researchers at tech giant Google introduced a new AI model recently known as Entities as Experts (EAE). Focusing on the problem of capturing declarative knowledge in the learned parameters of a language model, this new model has the capability to access distinct memories of the entities mentioned in a piece of text.

One of the popular components in text understanding is neural network sequence models which are pre-trained as language models. It has been suggested that neural language models could take the place of curated knowledge bases or textual corpora for tasks such as question-answering.

Behind Entities as Experts (EAE)

The researchers focussed on developing neural sequence models that capture the knowledge required to answer questions. The researchers trained this model to predict masked outspans in English Wikipedia text. They also train the EAE model to access memories for entity mentions only, and to access the correct memories for each entity mentioned. By associating memories with specific entities, EAE can learn to access them sparsely.

How EAE Works

The basic model architecture follows the Transformer interleaved with the entity memory layer. The EAE model has two embedding matrices, which are token and entity embeddings. The researchers evaluated the ability of EAE to capture declarative knowledge using the cloze knowledge probes and the TriviaQA question answering task.

In the above figure, of the EAE model, the initial transformer layer output is used to predict mention boundaries, to retrieve entity embeddings from entity memory, and to construct input to the next transformer layer, augmented with the retrieved entity embeddings of the previously retrieved entity embeddings from memory.

The final transformer layer can be connected to multiple task-specific heads. In this model, the final transformer block output is connected to task-specific heads, which are token prediction and entity prediction. The token prediction head predicts masked tokens for a cloze task and the entity prediction head predicts entity id for each entity mention span, i.e., entity linking.

In cloze tests, a model must recover the words in a blanked out mention by correctly associating

the mention with its surrounding sentence context. The entity retrieval, after the first transformer layer mentioned above, is then supervised with an entity linking objective during pre-training.

In this model, the entity memory layer is closely tied to memory-based neural layers, which can be seen as a memory network where memory access is supervised through entity linking, and each memory slot corresponds to a learned entity representation.

How It Is Different From Other Models

According to the researchers, unlike previous efforts to integrate entity knowledge into sequence models, this AI model does not rely on an external knowledge base for its entity representations. Instead, it learns them directly from the text, along with all the other model parameters.

Dataset Used

The researchers build the training corpus of contexts paired with entity mention labels

from the dump of English Wikipedia. For evaluation of the EAE model, the researchers used TriviaQA dataset, which is a large scale dataset that contains question-answer pairs. The dataset is a reading comprehension dataset containing over 650K question-answer-evidence triples.

The researchers claimed that they use a lesser amount of data to train and test the model. The Wikipedia data is pre-processed by removing 20% of randomly chosen entity mentions. The researchers further created development and test sets that have the same form as the training data.

Wrapping Up

According to the researchers, only accessing a small proportion of its parameters at inference time, this AI model outperforms a much larger model on TriviaQA. The researchers claimed that Entities as Experts (EAE) outperforms a Transformer model with 30× the parameters and contains more factual knowledge than a similar-sized BERT, according to the Language Model Analysis (LAMA) knowledge probe.

Read the paper here.

Access all our open Survey & Awards Nomination forms in one place

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.