Last updated March 17, 2022
In AI News & Update

Google researchers introduce Multimodal Bottleneck Transformer for audiovisual fusion

Machine perception models are usually modality-specific and optimised for unimodal benchmarks.

Published on March 17, 2022
by Kartik Wali

Google researchers have proposed a new transformer architecture (MBT) for audiovisual fusion and explored different fusion strategies using cross-attention between latent tokens in a new paper called, Attention Bottlenecks for Multimodal Fusion.

Machine perception models are usually modality-specific and optimised for unimodal benchmarks, and hence the late-stage fusion of final representations or predictions from each modality (‘late-fusion’) is still a dominant paradigm for multimodal video classification. Multimodal Bottleneck Transformer uses ‘fusion bottlenecks’ for modality fusion at multiple layers. Compared to traditional pairwise self-attention, MBT forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the important information in each modality and only share what is necessary.

The researchers showed restricting cross-modal attention via a small set of fusion bottlenecks achieved state-of-the-art results on a number of video classification benchmarks while also reducing computational costs compared to vanilla cross-attention models.

Access all our open Survey & Awards Nomination forms in one place >>

Kartik Wali

A writer by passion, Kartik strives to get a deep understanding of AI, Data analytics and its implementation on all walks of life. As a Senior Technology Journalist, Kartik looks forward to writing about the latest technological trends that transform the way of life!

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.

AIM Launches the 3rd Edition of Data Engineering Summit. May 30-31, Bengaluru

Join the forefront of data innovation at the Data Engineering Summit 2024, where industry leaders redefine technology’s future.