MITB Banner

Hugging Showcases Demos Based On Open Source Text-To-Video Models, Pinpoints Flaws

Hugging Face's AI WebTV aims make open-source text-to-video models like Zeroscope and MusicGen more accessible.

Share

Listen to this story

Hugging Face, the AI developers’ go-to platform has released AI WebTV, as the latest advancement in automatic video and music synthesis. The model aims to advocate for open-source accessible text-to-video models like Zeroscope and MusicGen.

The technique excels in replacing backgrounds during camera panning or rotation. Moreover, it gives users creative freedom, granting control over the number of frames in the generation process, resulting in high quality slow-motion effects. The prime video model behind the WebTV is Zeroscope V2, that can be implemented in NodeJS and TypeScript.

The HF model works by taking video shot prompts, which then via a text-to-video model, generate results in a sequence of takes. To enhance the creative process further, a human-authored base theme and idea are fed into a large language model, to generate diverse individual prompts for each video clip.

Prompt: 3D rendered animation showing a group of food characters forming a pyramid, with a banana standing triumphantly on top. In a city with cotton candy clouds and chocolate road, Pixar’s style, CGI, ambient lighting, direct sunlight, rich color scheme, ultra realistic, cinematic, photorealistic.

Talking about the ability of text-to-video models, the HF blog stated, “We’ve seen it with large language models and their ability to synthesize convincing content that mimics human responses, but this takes things to a whole new dimension when applied to video,” said the HF blog authored by Julian Bilcke

The video sequences released along with the demo are made short, to show WebTV as a tech demo rather than an actual show with an art direction or programming.

Even though the advancement is being lauded, HF has pointed out a few cases where the model fails. Firstly, it can have issues with movement and direction. For instance, a clip can sometimes be played in reverse. Also, at certain instances the modifier keyword is not taken into account. Furthermore, the model sometimes injects words from the prompt which can appear in the video.

Source: https://huggingface.co/blog/ai-webtv

Similar to HF’s model, last year in September Meta AI released Make-A-Video but the model remains closed source like the majority of services announced the the tech giant. 

Read more: Meta AI Releases A Multimodal Model “CM3leon”  — But Won’t Release It

Share
Picture of Tasmia Ansari

Tasmia Ansari

Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.