MITB Banner

OpenAI Sora Ignites Physics Debate  

OpenAI’s newest text-to-video generation model has invited scrutiny by rival big tech companies

Share

Illustration by Nikhil Kumar

Last week, the internet went berserk with OpenAI’s first video-generation model Sora. However, at the same time, a flurry of AI experts and researchers from competitor companies were quick to dissect and criticise Sora’s transformer model, igniting a physics debate of sorts. 

AI scientist Gary Marcus was one among the many to criticise not just the accuracy of the videos generated by Sora, but also the generative AI model used for video synthesis.  

Competitors Unite

In a move that seemed to undermine Sora’s diffusion model structure, Meta and Google dissed the model’s understanding of the physical world. 

Meta chief Yann LeCun said, “The generation of mostly realistic-looking videos from prompts does not indicate that a system understands the physical world. Generation is very different from causal prediction from a world model. The space of plausible videos is very large, and a video generation system merely needs to produce one sample to succeed.” 

LeCun explains further in order to differentiate Sora from Meta’s latest AI model offering, V-JEPA (Video Joint Embedding Predictive Architecture), a model that analyses interactions between objects in videos. He said, “That is the whole point behind the JEPA (Joint Embedding Predictive Architecture), which is not generative and makes predictions in representation space” – a push to make V-JEPA’s self-supervised model seem superior to Sora’s diffusion transformer model. 

Researcher and entrepreneur Eric Xing chimed in to support LeCun’s views. “An agent model that can reason based on understanding must go beyond LLMs or DMs,” he said.

The timing of the Gemini Pro 1.5 announcement couldn’t have been better. The videos generated by Sora were made to run on Gemini 1.5 Pro, where the model critiqued the inconsistencies in the video suggesting that “it is not a real-life scene”. 

Elon Musk was not far behind. He called Tesla’s video-generation capabilities superior to OpenAI’s with respect to predicting accurate physics. 

Source: X

While the experts have been quick to dismiss the generative model’s capabilities, the understanding of the ‘physics’ behind the model has been overlooked.  

The Physics of Things

Sora uses a transformer architecture similar to GPT models, and OpenAI believes that the foundation will ‘understand and simulate the real world’, which will help towards achieving AGI. Though not called a physics engine, it is possible that Unreal Engine 5’s generated data may have been used to train Sora’s underlying model. 

Senior research scientist at NVIDIA, Jim Fan, clarified OpenAI’s Sora model by explaining a data-driven physics engine. “Sora learns a physics engine implicitly in the neural parameters by gradient descent through massive amounts of videos,” he said, referring to Sora as a learnable simulator or world model.

Fan also expressed his disapproval of Sora’s reductionist views. “I see some vocal objections: ‘Sora is not learning physics, it’s just manipulating pixels in 2D’. I respectfully disagree with this reductionist view. It’s similar to saying, ‘GPT-4 doesn’t learn coding, it’s just sampling strings’. Well, what transformers do is just manipulate a sequence of integers (token IDs). What neural networks do is just manipulating floating numbers. That’s not the right argument,” he said. 

Sora is at the GPT-3 Moment 

Perplexity founder Aravind Srinivas, who has been vocal on social media of late, also spoke in support of LeCun. “Reality is Sora, while being amazing, is still not ready yet to model physics accurately,” he said.  

Interestingly, OpenAI themselves have called out the limitations of the model before anyone could point them out. The company blog states that Sora may struggle with accurately simulating the physics of a complex scene, where it may not understand specific instances of cause and effect. It can also get confused with spatial details of a prompt, such as following a specific camera trajectory, and more. 

Fan has also likened Sora with the ‘GPT-3 moment’ in 2020, when the model required ‘heavy prompting and babysitting’. However, it was the ‘first compelling demonstration of in-context learning as an emergent property’.

The current limitations do not cloud the quality of output generated. When OpenAI acquired Global Illumination, a digital product company that created open-source game Biomes (that resembles Minecraft) in August last year, the scope of video generation and building simulation-model platforms via auto agents were some of the speculations. 

Now, with the release of Sora, the possibilities to disrupt the video game industry has only escalated. If Sora is at the GPT-3 moment, GPT-4 of the model will be incomprehensible. Until then, sceptics will continue to debate and probably teach one another a thing or two.  

Source: X

Share
Picture of Vandana Nair

Vandana Nair

As a rare blend of engineering, MBA, and journalism degree, Vandana Nair brings a unique combination of technical know-how, business acumen, and storytelling skills to the table. Her insatiable curiosity for all things startups, businesses, and AI technologies ensures that there's always a fresh and insightful perspective to her reporting.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.