The alignment of the dancing moves to the music beats is a fundamental human behaviour, a form of art that requires constant practice and professional training. In addition, expressive choreography calls for equipping the dancer with a rich repertoire of dance moves.
Researchers explained while this process is challenging for people, it is even more difficult for a machine learning model. This is because an ML model requires generating a continuous motion with high kinematic complexity while capturing the non-linear relationship between the movements and the accompanying music.
Sign up for your weekly dose of what's up in emerging technology.
Entering the domain, Shan Yang, Software Engineer, and Angjoo Kanazawa, Research Scientist from Google Research, have proposed a full-attention cross-modal Transformer (FACT) model that can mimic and understand dance motions and can even enhance a person’s ability to choreograph dance.
In addition to the model, the team released a large-scale, multi-modal 3D dance motion dataset, aka AIST++. It comprises 5.2 hours of 3D dance motion in 1408 sequences, covering ten dance genres, each having multi-view movies with known camera angles. The proposed model — FACT, outperforms previous state-of-the-art approaches, both qualitatively and quantitatively.
Last year, researchers from Shanghai Tech University introduced a new GAN-based framework that can perform human image synthesis by using a 3D body mesh recovery module known as Impersonator++. According to the researchers, the Impersonator++ framework tackles human image synthesis, including human motion imitation, appearance transfer, and novel view synthesis.
The ten dance genres in the AIST++ dataset include: Old School (Break, Pop, Lock and Waack) and New School (Middle Hip-Hop, LA-style Hip-Hop, House, Krump, Street Jazz and Ballet Jazz). Although it contains multi-view videos of dancers, these cameras are not calibrated.
“We present a model that can not only learn the audio-motion correspondence but also can generate high-quality 3D motion sequences conditioned on music. Because generating 3D movement from music is a nascent area of study, we hope our work will pave the way for future cross-modal audio to 3D motion generation,” said the blog.
Get the codes here.
Get the AIST++ dataset here.
Read the entire paper here.