MITB Banner

Now Facebook’s AI Model Can Anticipate Your Future Actions

AVT would be a strong candidate for tasks beyond anticipation, such as self-supervised learning, general action recognition in tasks that require modelling temporal ordering, and even for discovering action schemas and boundaries.

Share

Anticipating the next moves and predicting the same with accuracy is definitely exciting but difficult. For example, it may be easy to predict whether the next ball in a game of cricket will be hit for a six or four; however, any wrong prediction will not be a costly affair. Let’s consider another situation where an autonomous vehicle is on the road at a stop sign. Now, the situation demands the AV to predict whether the pedestrian will cross the road or not. Anticipating future activities is a difficult issue for AI since it necessitates both predicting the multimodal distribution of future activities and modelling the course of previous actions.

To overcome this challenge, two researchers, namely Rohit Girdhar from Facebook AI Research and Kristen Grauman from the University of Texas, Austin, came together to propose Anticipative Video Transformer (AVT). 

The science behind AVT

The researchers leveraged recent advancements in transformer architectures, especially for image modelling and natural language processing for AVT. This end-to-end attention-based video modelling architecture takes into account the previously observed video in order to anticipate future actions.

The model is designed to produce predictions for future actions, given a video clip as input. In order to accomplish the same, it leverages a two-stage architecture, consisting of:

  • Backbone network that operates on individual frames or short clips. This backbone, referred to as AVT-b, adopts the recently proposed Vision Transformer (ViT) architecture, and it has earlier shown impressive results for static image classification, followed by;
  • Head architecture that operates on the frame/clip level features to predict future features and actions. It is referred to as AVT-h and is used to predict the future features for each input frame using a Causal Transformer Decoder.

In addition, AVT employs causal attention modelling—predicting the future actions based only on the frames observed so far—and is trained using objectives inspired by self-supervised learning. The AVT model architecture is shown below:

Image Source: Paper

In addition to that, researchers train the model to predict future actions and features using three losses:

  • First, the classification of features was done in the last frame of a video clip in order to predict labelled future action.
  • Second, the model regresses the intermediate frame feature to the features of the succeeding frames, which ultimately trains the model to predict what the next possible step will be .
  • Third, they train the model to classify intermediate actions.

“Through extensive experimentation on four popular benchmarks, we show its applicability in anticipating future actions, obtaining state-of-the-art results and demonstrating the importance of its anticipative training objectives,” as per the paper

Talking about some of its future applications, the researchers believe AVT would be a strong candidate for tasks beyond anticipation, such as self-supervised learning, general action recognition in tasks requiring modelling temporal ordering, and even discovering action schemas and boundaries.

Recent Facebook AI advances

  • In a recent claim, Facebook AI introduced a new language model solely based on audio – Generative Spoken Language Model (GSLM). This may now be considered as the first high-performance NLP model independent of the text. GSLM may function directly from raw audio signals without labels or text, with possible speech input to speech output, expanding the frontiers for textless NLP in diverse oral languages. 
  • Last month, the Facebook team introduced Instance-Conditioned GAN (IC-GANs), a new picture generating model. With or without input photographs from the training set, this new model produces high-quality, diversified images. In addition, IC-GANs, in contrast to previous approaches, may produce realistic, unforeseen image combinations.
  • Opacus, a free, open-source library for training deep learning models with differential privacy, was recently released by Facebook. This new tool is intended to be simple, flexible, and quick. It has a simple and user-friendly API that allows ML practitioners to make a training pipeline private with just two lines of code.

Facebook AI recent advancements made in the field of AI and ML have come a long way. Every now and then, the researchers from the organisation are advancing the field of artificial intelligence with some good, result-oriented works. 

Share
Picture of kumar Gandharv

kumar Gandharv

Kumar Gandharv, PGD in English Journalism (IIMC, Delhi), is setting out on a journey as a tech Journalist at AIM. A keen observer of National and IR-related news.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.