MITB Banner

Google researchers introduce Multimodal Bottleneck Transformer for audiovisual fusion

Machine perception models are usually modality-specific and optimised for unimodal benchmarks.

Google researchers have proposed a new transformer architecture (MBT) for audiovisual fusion and explored different fusion strategies using cross-attention between latent tokens in a new paper called, Attention Bottlenecks for Multimodal Fusion.

Machine perception models are usually modality-specific and optimised for unimodal benchmarks, and hence the late-stage fusion of final representations or predictions from each modality (‘late-fusion’) is still a dominant paradigm for multimodal video classification. Multimodal Bottleneck Transformer uses ‘fusion bottlenecks’ for modality fusion at multiple layers. Compared to traditional pairwise self-attention, MBT forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the important information in each modality and only share what is necessary. 

The researchers showed restricting cross-modal attention via a small set of fusion bottlenecks achieved state-of-the-art results on a number of video classification benchmarks while also reducing computational costs compared to vanilla cross-attention models.

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Kartik Wali

Kartik Wali

A writer by passion, Kartik strives to get a deep understanding of AI, Data analytics and its implementation on all walks of life. As a Senior Technology Journalist, Kartik looks forward to writing about the latest technological trends that transform the way of life!

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories