MITB Banner

Introducing Text-to-Video Generator, Tune-A-Video

With customised Sparse-Causal Attention, Tune-A-Video expands spatial self-attention to the spatiotemporal domain using pretrained text-to-image diffusion models.

Share

Listen to this story

Since the birth of text-to-image DALL-E by OpenAI, the AI world has been working towards similar models, for example, Midjourney, and Imagen, to name a few. Soon came text-to-video models like Transframer, NUWA Infinity, CogVideo, etc. Even text-to-voice models like VALL-E were recently unveiled by Microsoft.

Last month, researchers from Show Lab, National University of Singapore came up with a text-to-video generator called Tune-A-Video (TTV) to address the issue of One-Shot Video Generation, where only a single text-video pair is provided for training an open-domain text-to-video generator. With customised Sparse-Causal Attention, Tune-A-Video expands spatial self-attention to the spatiotemporal domain using pretrained text-to-image (TTI) diffusion models.

Tune-To-Video

Check the unofficial implementation of Tune-A-Video here. 

In one training sample, the projection matrices in the attention block are modified to include the relevant motion information. Tune-A-Video can create temporally coherent videos for various applications, including changing the subject or background, modifying attributes, and transferring styles.

It was discovered that TTI models could produce images that match verb terms well and that expanding TTI models to generate different images at once demonstrates unexpectedly strong content consistency.

Fine-Tuning: TTI models are expanded to TTV models using TTI model weights that have already been pretrained. The text-video pair is then subjected to one-shot tuning in order to create a one-shot TTV model. 

Inference: A modified text prompt is used to generate new videos.

After receiving a video and text pair as input, it modifies the projection matrices in attention blocks.

Read the full paper here.

Share
Picture of Shritama Saha

Shritama Saha

Shritama (she/her) is a technology journalist at AIM who is passionate to explore the influence of AI on different domains including fashion, healthcare and banks.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.