MITB Banner

Deep Mind OpenSources Perceiver IO, Its Latest DL Model

The model is suitable for different applications given its capacity to produce various outputs from various inputs.

Share

DeepMind has open-sourced their general-purpose deep learning model Perceiver IO. The tool can handle many different inputs and outputs and serves as a ‘drop-in’ replacement for transformers. 

The original Perceiver model supported many kinds of inputs but was limited to producing straightforward outputs. Its successor is Perceiver IO, which can handle arbitrary outputs in single addition to random inputs. It is a more general version of the original architecture. Broadening the model’s capacity, Perceiver IO is a single network that can easily integrate and transform arbitrary information for arbitrary tasks.

Perceiver IO’s research paper states, “Perceiver IO overcomes the limitation without sacrificing the original’s appealing properties by learning to flexibly query the model’s latent space to produce outputs of arbitrary size and semantics.” 

The model is suitable for different applications given its capacity to produce various outputs from various inputs. For instance, it can perform in real-world domains like language, vision, and challenging games like StarCraft II, all of them having a multimodal understanding. 

Perceiver IO can classify labels, produce language, optical flow, and multimodal videos with audio using the same building blocks as the original model. It can handle and process large size input-outputs better than standard Transformers, given its linear computational complexity. The bulk of the processing occurs in the latent space to further facilitate this. This allows Perceiver IO to perform BERT-style masked language modelling by directly using bytes (and not tokenised inputs).

The Hurdle for Transformers

Built on the Transformer architecture, Perceiver uses ‘attention’ to map inputs and outputs. Traditionally, this allows the model to process inputs after comparing all of its elements and basing them on their relationship and the task. However, while attention is widely used, it becomes expensive as the inputs grow, such as in common forms of data like images and videos containing millions of elements. 

Perceiver’s Architecture 

The Perceiver IO has overcome the above-mentioned issue by “scaling the Transformer’s attention operation to substantial inputs without introducing domain-specific assumptions”. The model architecture uses cross-attention to project high-dimensional input arrays into a lower-dimensional latent space. These can later be processed, but at a cost that is independent of the input’s size. Lastly, the latent representation is converted to output by applying a query array with the same number of elements as the desired output data. Deep models can flourish in this setting since the computational needs can grow along with the input growth. 

Credits: DeepMind Blog – The PerceiverIO Architecture 

The three steps of the Perceiver IO pipeline: 

  • Inputs are encoded to a latent space
  • The latent representation is refined via many layers of processing
  • The latent space is decoded to produce outputs

Features

This growth allows Perceiver IO to achieve that unprecedented level of generality and versatility over the original model that could only produce one output per input. In addition, Perceiver IO is competitive with domain-specific models on benchmarks based on images, 3D point clouds, and audio and ideas together, making it a fit for researchers. 

Along with attention to encoding, Perceiver IO also uses it to decode from the latent array – enhancing the flexibility of the network, scaling it to more extensive & more diverse input-outputs, all the while dealing with many types of data at once. 

This feature makes Perceiver IO a supermodel that can perform various applications like understanding the meaning of a text from each of its characters, playing games, tracking the movement of all points in an image, and processing the sound, images, and labels that make up a video. This is possible while just using a single architecture that’s simpler than the alternatives.

Deepmind’s experiments concluded that Perceiver IO could work across a wide range of benchmark domains, including language, vision, multimodal data, and games. It successfully provides an off-the-shelf way to handle many kinds of data.

To help researchers and machine learning communities at large, Deepmind has now open-sourced its code.

Share
Picture of Avi Gopani

Avi Gopani

Avi Gopani is a technology journalist that seeks to analyse industry trends and developments from an interdisciplinary perspective at Analytics India Magazine. Her articles chronicle cultural, political and social stories that are curated with a focus on the evolving technologies of artificial intelligence and data analytics.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.