Meta Introduces New Generative AI Models, Emu Video & Emu Edit

Progressing in generative AI, Meta has introduced new text to video and editing models; Emu Video & Emu Edit.
Listen to this story

Accelerating the progress on generative AI, Mark Zuckerberg’s Meta has introduced new text to video and editing models called Emu Video and Emu Edit, respectively. 

Emu Video is a new text-to-video generation model involving two steps: generating an image based on the text and then creating a video using both the text and the generated image. The model achieves high-quality, high-resolution videos through optimised noise schedules for diffusion and multi-stage training. 

Human evaluations demonstrate superior quality compared to existing works, with preferences of 81% over Google’s Imagen Video, 90% over NVIDIA’s PYOCO, and 96% over Meta’s Make-A-Video. The model also outperforms commercial solutions like RunwayML’s Gen2 and Pika Labs. Notably, their factorizing approach is well-suited for animating images based on user text prompts, surpassing prior works by 96%.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

On the other hand, Emu Edit is a multi-task image editing model demonstrating superior performance in instruction-based image editing. It outperforms existing models by training across various tasks like region-based editing, free-form editing, and computer vision tasks. 

Emu Edit’s success is attributed to its multi-task learning, using learned task embeddings to guide the generation process accurately. The model proves its versatility by successfully generalising to new tasks with minimal labeled examples, addressing scenarios with limited high-quality samples. Additionally, a comprehensive benchmark with seven diverse image editing tasks is introduced for a thorough evaluation of instructable image editing models.

The model addresses the limitations of existing generative AI models in image editing. It focuses on precise control and enhanced capabilities by incorporating computer vision tasks as instructions. The model handles free-form editing, including tasks like background manipulation, color transformations, and object detection. 

Unlike many existing models, it follows instructions precisely, ensuring that only pixels relevant to the edit request are altered. The model is trained on a large dataset of 10 million synthesized samples, presenting unprecedented results in terms of instruction faithfulness and image quality. Emu Edit surpasses current methods in both qualitative and quantitative evaluations for various image editing tasks, establishing new state-of-the-art performance.

Shritama Saha
Shritama Saha is a technology journalist who is keen to learn about AI and analytics play. A graduate in mass communication, she is passionate to explore the influence of data science on fashion, drug development, films, and art.

Download our Mobile App


AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry


Strengthen Critical AI Skills with Trusted Corporate AI Training

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox