DeepMind’s multi-tasking, generalist model Gato can do it all

The basic tenet that Gato followed was to train using the widest range of data possible, including modalities like images, text, button presses, joint torques and other actions based on the context.

Advertisement

With the growing number of large-language models and a multi-modal approach to training, DeepMind has released a multi-modal, multi-tasking, multi-embodiment generalist policy called Gato. The sole generalist agent was trained using data from a variety of tasks and modalities in a way that the same network with the same weights can do everything from playing Atari, writing captions for images, chatting and using a robot arm to stack blocks to navigating in simulated 3D environments. DeepMind has also released a paper titled, ‘A Generalist Agent,’ which described the training process and the model’s capabilities. 

            Source: DeepMind research paper

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Similar to the training process followed with large language models, the training data is serialised into a flat sequence of tokens, made into batches, and processed by a transformer neural network. The basic tenet that Gato followed was to train using the widest range of data possible, including modalities like images, text, button presses, joint torques and other actions based on the context. 

During the deployment stage, a prompt is tokenised, forming the initial sequence, after which the environment sends the first observation, which is, in turn, tokenised and added to the sequence. The model then samples an action vector autoregressively, one token after another. 

     Source: DeepMind research paper
The model demonstrated that transformer sequence models work better as multi-tasking policies for real-world scenarios and vision and robotic tasks. Gato shows the potential to take the first step to learn new tasks via prompting instead of training a model from scratch.

More Great AIM Stories

Poulomi Chatterjee
Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MORE FROM AIM
Amit Raja Naik
Oh boy, is JP Morgan wrong?

The global brokerage firm has downgraded Tata Consultancy Services, HCL Technology, Wipro, and L&T Technology to ‘underweight’ from ‘neutral’ and slashed its target price by 15-21 per cent.