Active Hackathon

What Is Google’s Recently Launched BigBird

Recently, Google Research introduced a new sparse attention mechanism that improves performance on a multitude of tasks that require long contexts known as BigBird. The researchers took inspiration from the graph sparsification methods.

They understood where the proof for the expressiveness of Transformers breaks down when full-attention is relaxed to form the proposed attention pattern. They stated, “This understanding helped us develop BigBird, which is theoretically as expressive and also empirically useful.”


Sign up for your weekly dose of what's up in emerging technology.

Why is BigBird Important?

Bidirectional Encoder Representations from Transformers or BERT, a neural network-based technique for natural language processing (NLP) pre-training has gained immense popularity in the last two years. This technology enables anyone to train their own state-of-the-art question answering system. 

However, one of the core limitations of this technique is the quadratic dependency, mainly in terms of memory on the sequence length due to their full attention mechanism. This also increases the cost when it comes to using transformer-based models for processing long sequences. To mitigate this issue, the researchers introduced BigBird.

Behind BigBird

BigBird is a universal approximator of sequence functions which is designed mainly to satisfy all the known theoretical properties of full transformers. According to the researchers, this sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware.

In particular, the BigBird consists of three main parts:

  • A set of global tokens that attends to all parts of the sequence.
  • A set of random keys for each query.
  • A block of local neighbours so that each node attends to their local structure.

Dataset Used

To train the encoder of the model, the researchers used four challenging datasets, which are-

1| Natural Questions: Natural Questions corpus is a question answering dataset. The dataset consists of 307,373 training examples with single annotations, 7,830 cases with 5-way annotations for development data, and a further 7,842 examples 5-way annotated sequestered as test data. 

2| HotpotQA-distractor: HotpotQA is a large-scale dataset with 113k Wikipedia-based question-answer pairs. The dataset is collected by crowdsourcing based on Wikipedia articles, where crowd workers are shown multiple supporting context documents and asked explicitly to come up with questions requiring reasoning about all of the documents.

3| TriviaQA-wiki: TriviaQA is a large-scale challenging reading comprehension dataset containing over 650K question-answer-evidence triples. The dataset includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high-quality distant supervision for answering the questions.

4| WikiHop: WikiHop dataset consists of sets of Wikipedia articles where answers to queries about specific properties of an entity cannot be located in the entity’s report. 

Contributions of This Research

The main contributions of this research are-

  • BigBird satisfies all the known theoretical properties of a full transformer. In particular, the researchers showed that adding extra tokens allows one to express all continuous sequence-to-sequence functions with only O(n)-inner products. Also, they showed that under standard assumptions regarding precision, BigBird is Turing complete.
  • They showed that the extended context modelled by BigBird greatly benefits a variety of NLP tasks. In particular, the researchers achieved state-of-the-art results for question-answering and document summarisation on several different datasets.
  • Lastly, they introduced a novel application of attention-based models where long contexts are beneficial, such as extracting contextual representations of genomics sequences like DNA. Also, with longer masked LM pretraining, BigBird improves performance on downstream tasks such as promoter-region and chromatin profile prediction.

Wrapping Up

BigBird satisfies many theoretical results, such as the technique is a universal approximator of sequence to sequence functions and is Turing complete. Considering the consequence of the capability to handle longer context, BigBird drastically improves performance on various NLP tasks such as question answering and long document summarisation. 

Furthermore, the researchers also proposed novel applications to genomics data by introducing an attention-based contextual language model for DNA and fine-tune it for downstream tasks such as promoter region prediction and predicting effects of non-coding variants.
Read the paper here.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Council Post: How to Evolve with Changing Workforce

The demand for digital roles is growing rapidly, and scouting for talent is becoming more and more difficult. If organisations do not change their ways to adapt and alter their strategy, it could have a significant business impact.

All Tech Giants: On your Mark, Get Set – Slow!

In September 2021, the FTC published a report on M&As of five top companies in the US that have escaped the antitrust laws. These were Alphabet/Google, Amazon, Apple, Facebook, and Microsoft.

The Digital Transformation Journey of Vedanta

In the current digital ecosystem, the evolving technologies can be seen both as an opportunity to gain new insights as well as a disruption by others, says Vineet Jaiswal, chief digital and technology officer at Vedanta Resources Limited

BlenderBot — Public, Yet Not Too Public

As a footnote, Meta cites access will be granted to academic researchers and people affiliated to government organisations, civil society groups, academia and global industry research labs.