Google Releases An ML Model That Can Choreograph Dance

Full-attention cross-modal Transformer (FACT) model that can mimic and understand dance motions and can even enhance a person’s ability to choreograph dance.

The alignment of the dancing moves to the music beats is a fundamental human behaviour, a form of art that requires constant practice and professional training. In addition, expressive choreography calls for equipping the dancer with a rich repertoire of dance moves. 

Researchers explained while this process is challenging for people, it is even more difficult for a machine learning model. This is because an ML model requires generating a continuous motion with high kinematic complexity while capturing the non-linear relationship between the movements and the accompanying music.

Entering the domain, Shan Yang, Software Engineer, and Angjoo Kanazawa, Research Scientist from Google Research, have proposed a full-attention cross-modal Transformer (FACT) model that can mimic and understand dance motions and can even enhance a person’s ability to choreograph dance.

In addition to the model, the team released a large-scale, multi-modal 3D dance motion dataset, aka AIST++. It comprises 5.2 hours of 3D dance motion in 1408 sequences, covering ten dance genres, each having multi-view movies with known camera angles. The proposed model — FACT, outperforms previous state-of-the-art approaches, both qualitatively and quantitatively.

Last year, researchers from Shanghai Tech University introduced a new GAN-based framework that can perform human image synthesis by using a 3D body mesh recovery module known as Impersonator++. According to the researchers, the Impersonator++ framework tackles human image synthesis, including human motion imitation, appearance transfer, and novel view synthesis.

The ten dance genres in the AIST++ dataset include: Old School (Break, Pop, Lock and Waack) and New School (Middle Hip-Hop, LA-style Hip-Hop, House, Krump, Street Jazz and Ballet Jazz). Although it contains multi-view videos of dancers, these cameras are not calibrated.

“We present a model that can not only learn the audio-motion correspondence but also can generate high-quality 3D motion sequences conditioned on music. Because generating 3D movement from music is a nascent area of study, we hope our work will pave the way for future cross-modal audio to 3D motion generation,” said the blog.

Get the codes here.

Get the AIST++ dataset here.

Read the entire paper here.

Download our Mobile App

kumar Gandharv
Kumar Gandharv, PGD in English Journalism (IIMC, Delhi), is setting out on a journey as a tech Journalist at AIM. A keen observer of National and IR-related news.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.