MITB Banner

How conditional object-centric learning achieves better generalisation

SAVi or Slot Attention for Video is a sequential extension of the slot attention model architecture.

Share

Object-centric representations hold the key to steering ML models toward more systematic generalisation. The latest research around 3D data showed models with object-centric inductive biases can learn to segment and represent meaningful objects from the statistical structure of the data. 

Object-centric learning is an unsupervised model that identifies all the objects in the background and then combines them in a reconstructed image. Such representations make it easier for the robot to have a structural understanding of a complex environment. 

Object-centric learning increases the sample efficiency of a dataset, interpretability of a machine learning algorithm, and the ability to generalise to new tasks. Slot attention, a widely used model architecture in object-centric learning, refers to a repetitive process used to estimate the latent space around objects and identify the desired objects from the visuals.

The model then produces a set of task-relevant abstract representations of the image called slots. The slots are interchangeable between all objects in the scene and can be bound with any object in the input. 

Limitations

  • Though the model worked well for 2D video games or very simple 3D scenes, they could not infer accurately from complex 3D scenes. 
  • Sometimes, the model segmented the objects that did not match the intent of the task. At times, a specific object could be over-segmented into separate parts or could fail to segment an object into the desired parts. This is because the sensory information we get from an object depends on how we visually perceive it, making an object ambiguous. 

Research

The Google Research team introduced an approach (partly supervised) to overcome these limitations. The method (called SAVi or Slot Attention for Video) is a sequential extension of the slot attention model architecture. The study showed the model can be trained to predict frame reconstructions using optical flow data. Besides, conditioning the initial model on small hints like the centre of the mass position of an object eases the process of object segmentation. 

Initially, SAVi was tested in unsupervised conditions. Each slot in SAVi represented one object, an independently moving portion of the object or the background. 

Conditional object-centric learning was used for decomposing complex scenes. The method uses optical flow, which is data about the motion of individual pixels. Every slot was conditioned on external cues like bounding boxes or the coordinates of a single point on an object, for the first video frame.

The method uses the prediction-correction rule, where the output of the prediction is used to form the correction at the next step, allowing the model to track objects constantly over time.

Results

Slot Attention for Video isn’t specially trained for object segmentation and tracking, but the function is a natural outcome. There are no per-object segmentation labels for every slot, but SAVi can segment objects for far more complex scenes. 

SAVi is an object-centric slot-based architecture that was particularly effective with identifying and tracking objects. The method served to dismiss doubts in the past about object-centric learning being restricted by model capacity. Additionally, this could also lead the way for other semi-supervised studies. 

While SAVi used optical flow data from an unsupervised model, this information may not be available outside training. On the other hand, training solely with optical flow data could be an issue with fixed objects. Also, there is a vast gap between complex training environments and complex real-world scenarios. 

Share
Picture of Poulomi Chatterjee

Poulomi Chatterjee

Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.