MITB Banner

Google researchers introduce Multimodal Bottleneck Transformer for audiovisual fusion

Machine perception models are usually modality-specific and optimised for unimodal benchmarks.

Share

Google researchers have proposed a new transformer architecture (MBT) for audiovisual fusion and explored different fusion strategies using cross-attention between latent tokens in a new paper called, Attention Bottlenecks for Multimodal Fusion.

Machine perception models are usually modality-specific and optimised for unimodal benchmarks, and hence the late-stage fusion of final representations or predictions from each modality (‘late-fusion’) is still a dominant paradigm for multimodal video classification. Multimodal Bottleneck Transformer uses ‘fusion bottlenecks’ for modality fusion at multiple layers. Compared to traditional pairwise self-attention, MBT forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the important information in each modality and only share what is necessary. 

The researchers showed restricting cross-modal attention via a small set of fusion bottlenecks achieved state-of-the-art results on a number of video classification benchmarks while also reducing computational costs compared to vanilla cross-attention models.

Share
Picture of Kartik Wali

Kartik Wali

A writer by passion, Kartik strives to get a deep understanding of AI, Data analytics and its implementation on all walks of life. As a Senior Technology Journalist, Kartik looks forward to writing about the latest technological trends that transform the way of life!
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.