The alignment of the dancing moves to the music beats is a fundamental human behaviour, a form of art that requires constant practice and professional training. In addition, expressive choreography calls for equipping the dancer with a rich repertoire of dance moves.
Researchers explained while this process is challenging for people, it is even more difficult for a machine learning model. This is because an ML model requires generating a continuous motion with high kinematic complexity while capturing the non-linear relationship between the movements and the accompanying music.
Entering the domain, Shan Yang, Software Engineer, and Angjoo Kanazawa, Research Scientist from Google Research, have proposed a full-attention cross-modal Transformer (FACT) model that can mimic and understand dance motions and can even enhance a person’s ability to choreograph dance.
In addition to the model, the team released a large-scale, multi-modal 3D dance motion dataset, aka AIST++. It comprises 5.2 hours of 3D dance motion in 1408 sequences, covering ten dance genres, each having multi-view movies with known camera angles. The proposed model — FACT, outperforms previous state-of-the-art approaches, both qualitatively and quantitatively.
Last year, researchers from Shanghai Tech University introduced a new GAN-based framework that can perform human image synthesis by using a 3D body mesh recovery module known as Impersonator++. According to the researchers, the Impersonator++ framework tackles human image synthesis, including human motion imitation, appearance transfer, and novel view synthesis.
The ten dance genres in the AIST++ dataset include: Old School (Break, Pop, Lock and Waack) and New School (Middle Hip-Hop, LA-style Hip-Hop, House, Krump, Street Jazz and Ballet Jazz). Although it contains multi-view videos of dancers, these cameras are not calibrated.
“We present a model that can not only learn the audio-motion correspondence but also can generate high-quality 3D motion sequences conditioned on music. Because generating 3D movement from music is a nascent area of study, we hope our work will pave the way for future cross-modal audio to 3D motion generation,” said the blog.
Get the codes here.
Get the AIST++ dataset here.
Read the entire paper here.