MITB Banner

What Is Google’s Recently Launched BigBird

Share

Recently, Google Research introduced a new sparse attention mechanism that improves performance on a multitude of tasks that require long contexts known as BigBird. The researchers took inspiration from the graph sparsification methods.

They understood where the proof for the expressiveness of Transformers breaks down when full-attention is relaxed to form the proposed attention pattern. They stated, “This understanding helped us develop BigBird, which is theoretically as expressive and also empirically useful.”

Why is BigBird Important?

Bidirectional Encoder Representations from Transformers or BERT, a neural network-based technique for natural language processing (NLP) pre-training has gained immense popularity in the last two years. This technology enables anyone to train their own state-of-the-art question answering system. 

However, one of the core limitations of this technique is the quadratic dependency, mainly in terms of memory on the sequence length due to their full attention mechanism. This also increases the cost when it comes to using transformer-based models for processing long sequences. To mitigate this issue, the researchers introduced BigBird.

Behind BigBird

BigBird is a universal approximator of sequence functions which is designed mainly to satisfy all the known theoretical properties of full transformers. According to the researchers, this sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware.

In particular, the BigBird consists of three main parts:

  • A set of global tokens that attends to all parts of the sequence.
  • A set of random keys for each query.
  • A block of local neighbours so that each node attends to their local structure.

Dataset Used

To train the encoder of the model, the researchers used four challenging datasets, which are-

1| Natural Questions: Natural Questions corpus is a question answering dataset. The dataset consists of 307,373 training examples with single annotations, 7,830 cases with 5-way annotations for development data, and a further 7,842 examples 5-way annotated sequestered as test data. 

2| HotpotQA-distractor: HotpotQA is a large-scale dataset with 113k Wikipedia-based question-answer pairs. The dataset is collected by crowdsourcing based on Wikipedia articles, where crowd workers are shown multiple supporting context documents and asked explicitly to come up with questions requiring reasoning about all of the documents.

3| TriviaQA-wiki: TriviaQA is a large-scale challenging reading comprehension dataset containing over 650K question-answer-evidence triples. The dataset includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high-quality distant supervision for answering the questions.

4| WikiHop: WikiHop dataset consists of sets of Wikipedia articles where answers to queries about specific properties of an entity cannot be located in the entity’s report. 

Contributions of This Research

The main contributions of this research are-

  • BigBird satisfies all the known theoretical properties of a full transformer. In particular, the researchers showed that adding extra tokens allows one to express all continuous sequence-to-sequence functions with only O(n)-inner products. Also, they showed that under standard assumptions regarding precision, BigBird is Turing complete.
  • They showed that the extended context modelled by BigBird greatly benefits a variety of NLP tasks. In particular, the researchers achieved state-of-the-art results for question-answering and document summarisation on several different datasets.
  • Lastly, they introduced a novel application of attention-based models where long contexts are beneficial, such as extracting contextual representations of genomics sequences like DNA. Also, with longer masked LM pretraining, BigBird improves performance on downstream tasks such as promoter-region and chromatin profile prediction.

Wrapping Up

BigBird satisfies many theoretical results, such as the technique is a universal approximator of sequence to sequence functions and is Turing complete. Considering the consequence of the capability to handle longer context, BigBird drastically improves performance on various NLP tasks such as question answering and long document summarisation. 

Furthermore, the researchers also proposed novel applications to genomics data by introducing an attention-based contextual language model for DNA and fine-tune it for downstream tasks such as promoter region prediction and predicting effects of non-coding variants.
Read the paper here.

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.