MITB Banner

Meta’s V-JEPA Video Model Learns by Watching

Meta released V-JEPA, a new AI model, advancing towards human-like machine intelligence by analyzing video interactions.

Share

Along with Open AI’s Sora, Meta released a new AI model called Video Joint Embedding Predictive Architecture (V-JEPA) yesterday. V-JEPA  improves machines’ understanding of the world by analysing interactions between objects in videos. The model continues Yann LeCun, Meta’s VP & Chief AI Scientist’s vision, for creating machine intelligence that learns similarly to humans. 

The fifth iteration of I-JEPA which was released mid last year has seen developments from comparing abstract representations of images rather than the pixels themselves and extending it to videos. It advances the predictive approach by learning from learning from images to videos, which introduces the complexity of temporal (time-based) dynamics in addition to spatial information.

V-JEPA predicts missing parts of videos without needing to recreate every detail. It learns from unlabeled videos, which means it doesn’t require data that’s been categorised by humans to start learning. 

This method makes V-JEPA more efficient, requiring fewer resources to train. The model is particularly good at learning from a small amount of information, making it faster and less resource-intensive compared to older models.

The model’s development involved masking large sections of videos. This approach forces V-JEPA to make guesses based on limited context, helping it understand complex scenarios without needing detailed data. V-JEPA focuses on the general idea of what’s happening in a video rather than specific details, like the movement of individual leaves on a tree.

V-JEPA has shown promising results in tests, where it outperformed other video analysis models using a fraction of the data typically required. This efficiency is seen as a step forward in AI, making it possible to use the model for various tasks without extensive retraining.

Looking ahead, Meta plans to expand V-JEPA’s capabilities, including adding sound analysis and improving its ability to understand longer videos. 

This work supports Meta’s broader goal of advancing machine intelligence to perform complex tasks more like humans. V-JEPA is available under a Creative Commons NonCommercial licence, allowing researchers worldwide to explore and build upon this technology.

Share
Picture of K L Krithika

K L Krithika

K L Krithika is a tech journalist at AIM. Apart from writing tech news, she enjoys reading sci-fi and pondering the impossible technologies, trying not to confuse it with reality.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India