Last updated February 2, 2021
In AI Mysteries

How Disney is Using Machine Learning For Realistic Animation

Share

Published on December 3, 2020

by Shraddha Goled

The task of capturing and animating 3D rendered faces in movies or games is the most important yet the most difficult task. While it is relatively an easy task to animate 3D faces, getting the subtleties of the human expression, which is limitless in variety, is a humongous task.

The researchers at Disney have smoothened this process using a machine learning tool that makes it easy to generate and manipulate 3D faces through 3D face modelling using neural architectures.

Conventional Methods

Conventionally, face models that are built from 3D face databases, which are used in tasks such as facial reconstruction, replacement, and manipulation, use multi-linear morphable models. Such a method of modelling, while providing control over facial identity, poses a lack of expressivity due to its linear nature. A linear model may also blend several static expressions that sometimes lead to the creation of physically unrealistic and impossible face shapes. So while on the one hand, this linear model displays a constraint of demonstrable face shapes, and on the other, it may also lead to the representation of many ‘non-face’ shapes.

Credit: Disney Research

To overcome this drawback, researchers have been adopting deep neural networks which offer nonlinear face modelling. However, the research in nonlinear morphable models has been limited to generating realistic 2D facial images, with a lesser focus on 3D modelling.

Semantic Deep Face Model

The researchers at Disney have proposed a technique to create 3D face modelling using neural architectures which combines the good parts of both the linear and nonlinear methods discussed above. This technique ‘disentangles’ the identity and expression component of the facial model and provides semantic control over both. It entails the use of various techniques such as 3D face synthesis, facial performance transfer, performance editing, and 2D landmark performance retargeting. According to the authors of this study, the resulting model is ‘powerful, semantically controllable, nonlinear, parametric face model’.

This model proposes a network architecture that takes the neutral 3D geometry of a face with the target expression and learns to deform it into the desired expression. The disentanglement of identity and expression (also mentioned above), is not learned but is explicitly imposed on the architecture.

Once the model is accordingly trained with nonlinear behaviour, it can traverse the identity latent space to generate new 3D faces and traverse the expression latent space to synthesise new 3D expressions. Further, in an additional improvement to the existing techniques, this proposed method conditions the expression based on the linear blend shape (how one facial shape changes to another in animation) weights. This results in the more semantic exploration of the facial expression space.

Credit: Disney Research

This modelling technique offers several applications:

It can generate a novel human face and corresponding expressions, which can prove to be a valuable tool in creating 3D characters in virtual environments
3D facial retargeting, which involves capturing the performance of a human face and transferring to a 3D character, can be easily accomplished by this model.
It can generate expressions based on 2D landmarks which allows landmark-based performance capture and retargeting to a new identity.
This model also allows artists to edit performance, for example, add smile/frown to certain keyframes.

The corresponding research paper was presented at the International Conference On 3D Vision. The complete study can be found here and here.

Wrapping Up

The results of this study are certainly very impactful, given that now it is possible to generate thousands of faces with different shapes and tones and then animate them with the same expression at once. One can truly appreciate the research if looked from a larger perspective of where this study can be deployed.

It is noteworthy that just in July this year, researchers had proposed a deep learning-based method for face swapping to produce highly realistic images.

Access all our open Survey & Awards Nomination forms in one place

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.