What Is Google’s Recently Launched BigBird

Recently, Google Research introduced a new sparse attention mechanism that improves performance on a multitude of tasks that require long contexts known as BigBird. The researchers took inspiration from the graph sparsification methods.

They understood where the proof for the expressiveness of Transformers breaks down when full-attention is relaxed to form the proposed attention pattern. They stated, “This understanding helped us develop BigBird, which is theoretically as expressive and also empirically useful.”

Why is BigBird Important?

Bidirectional Encoder Representations from Transformers or BERT, a neural network-based technique for natural language processing (NLP) pre-training has gained immense popularity in the last two years. This technology enables anyone to train their own state-of-the-art question answering system. 

However, one of the core limitations of this technique is the quadratic dependency, mainly in terms of memory on the sequence length due to their full attention mechanism. This also increases the cost when it comes to using transformer-based models for processing long sequences. To mitigate this issue, the researchers introduced BigBird.

Behind BigBird

BigBird is a universal approximator of sequence functions which is designed mainly to satisfy all the known theoretical properties of full transformers. According to the researchers, this sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware.

In particular, the BigBird consists of three main parts:

  • A set of global tokens that attends to all parts of the sequence.
  • A set of random keys for each query.
  • A block of local neighbours so that each node attends to their local structure.

Dataset Used

To train the encoder of the model, the researchers used four challenging datasets, which are-

1| Natural Questions: Natural Questions corpus is a question answering dataset. The dataset consists of 307,373 training examples with single annotations, 7,830 cases with 5-way annotations for development data, and a further 7,842 examples 5-way annotated sequestered as test data. 

2| HotpotQA-distractor: HotpotQA is a large-scale dataset with 113k Wikipedia-based question-answer pairs. The dataset is collected by crowdsourcing based on Wikipedia articles, where crowd workers are shown multiple supporting context documents and asked explicitly to come up with questions requiring reasoning about all of the documents.

3| TriviaQA-wiki: TriviaQA is a large-scale challenging reading comprehension dataset containing over 650K question-answer-evidence triples. The dataset includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high-quality distant supervision for answering the questions.

4| WikiHop: WikiHop dataset consists of sets of Wikipedia articles where answers to queries about specific properties of an entity cannot be located in the entity’s report. 

Contributions of This Research

The main contributions of this research are-

  • BigBird satisfies all the known theoretical properties of a full transformer. In particular, the researchers showed that adding extra tokens allows one to express all continuous sequence-to-sequence functions with only O(n)-inner products. Also, they showed that under standard assumptions regarding precision, BigBird is Turing complete.
  • They showed that the extended context modelled by BigBird greatly benefits a variety of NLP tasks. In particular, the researchers achieved state-of-the-art results for question-answering and document summarisation on several different datasets.
  • Lastly, they introduced a novel application of attention-based models where long contexts are beneficial, such as extracting contextual representations of genomics sequences like DNA. Also, with longer masked LM pretraining, BigBird improves performance on downstream tasks such as promoter-region and chromatin profile prediction.

Wrapping Up

BigBird satisfies many theoretical results, such as the technique is a universal approximator of sequence to sequence functions and is Turing complete. Considering the consequence of the capability to handle longer context, BigBird drastically improves performance on various NLP tasks such as question answering and long document summarisation. 

Furthermore, the researchers also proposed novel applications to genomics data by introducing an attention-based contextual language model for DNA and fine-tune it for downstream tasks such as promoter region prediction and predicting effects of non-coding variants.
Read the paper here.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

More Stories


8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

Yugesh Verma
A beginner’s guide to Spatio-Temporal graph neural networks

Spatio-temporal graphs are made of static structures and time-varying features, and such information in a graph requires a neural network that can deal with time-varying features of the graph. Neural networks which are developed to deal with time-varying features of the graph can be considered as Spatio-temporal graph neural networks. 

Yugesh Verma
How is Boolean algebra used in Machine learning?

Machine learning model with Boolean algebra starts with the data with a target variable and input or learner variables and using the set of rules it generates output value by considering a given configuration of input samples.

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM