MITB Banner

Facebook Explains How Nearest Neighbour Search Is An Effective Approach For Language Modelling In The Long Tail

Share

With an aim to break down language barriers across the globe for everyone to understand and communicate with anyone, the researchers at Facebook AI Research (FAIR) work on complex problems to deploy robust language translation solutions. It spans the topics such as deep learning, natural language processing, text normalisation, word sense disambiguation and much more.  

Recently, the researchers at Facebook AI Research presented a new language modelling approach known as kNN-LM, which is based on the hypothesis that the representation learning problem may be easier than the prediction problem. Usually, a natural language model resolves two subproblems, which are mapping sentence prefixes to fixed-sized representations and the other one is to utilise these representations to predict the next word in a certain text.

This approach extends a pre-trained LM by linearly interpolating its next word distribution with a k-nearest neighbours (kNN) model. The nearest neighbours are then computed according to the distance in the pre-trained embedding space and can be extracted from text collection, including the original language model (LM) training data. According to the researchers, this approach allows the rare patterns to be memorised explicitly, rather than implicitly in model parameters. 

How It Works

The kNN-LM involves augmenting such as pre-trained LM with the nearest neighbours retrieval mechanism, without any additional training, which means the representations learned by the LM remain unchanged through the process.

One crucial point of this kNN-LM approach is that it is compatible with any model which produces fixed-size context representations. The researchers used decoder-only transformers for language modelling, and since kNN-LM makes no changes to the underlying LM, the researchers used the exact architecture for creating a kNN-LM for inference. 

According to the researchers, kNN-LM improves performance because of the following three points

  • With an implicit notion of similarity, the Transformer LM is efficient at learning a representation function for contexts.
  • While the Transformer has the capacity to memorise all the training examples, doing so causes its representation to generalise less effectively
  • The kNN-LM allows the model to memorise the training data while retaining an effective similarity function. 

Dataset Used

The researchers used several datasets for this project, they are mentioned below

  • Wikitext-103, which is a standard benchmark for autoregressive language modelling with a 250K word-level vocabulary consists of 103M tokens of Wikipedia in the training set and 250K tokens in each of the test sets.
  • Books dataset, which is the Toronto Books Corpus.
  • Wiki-3B is an English Wikipedia dataset which contains about 2.87B tokens.
  • Wiki-100M is a random 100M token subset of Wiki-3B Corpus.

Advantages of This Model: 

  • This approach has implications for efficiently scaling up to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbour datastore, again without further training
  • The model is specifically helpful in predicting rare patterns, such as factual knowledge, names, and near-duplicate sentences from the training set., 
  • It also improves performance when the same training data is used for learning the prefix representations and the kNN model, strongly suggesting that the prediction problem is more challenging than previously appreciated.

Wrapping Up

Over the years, the researchers at FAIR have performed several remarkable projects related to Natural language processing (NLP) and Natural Language Understanding (NLU). The researchers introduced the kNN-ML model, which significantly outperform standard language models by directly querying training examples at test time and can be applied to any neural language model. According to the researchers, the success of this method suggests that learning similarity functions between contexts may be an easier problem than predicting the next word from some given context. 

Read the paper here.

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India