Pixar’s “Toy Story,” was the first full-length computer-animated movie released back in 1995. According to an exclusive coverage by Insider, to render “Toy Story,” the animators had 117 computers running 24 hours a day. Each frame could take from 45 minutes to 30 hours to render, depending on how complex it was. There were a total of 114,240 frames to render. Throughout the movie, there are over 77 minutes of animation spread across 1,561 shots. They had to invent new software, called Renderman, to handle all this footage.
Twenty-five years down the line, Pixar continues to use new tech, moving the goalposts and setting new benchmarks. It is now using one of the most popular deep learning models, Generative Adversarial Networks (GANs) to generate super high-resolution imagery that can cater to the demands of 8K movie watching experience. Latest machine learning techniques used on Toy Story 4 reduced the final render times by 15-50% in challenging cases.
Deep Learning For Super Resolution
To leverage the deep learning advantages for feature film production, the researchers from Pixar have implemented GANs. The objective here is to create high-resolution images, which is an expensive process in the traditional sense. The team at Pixar, in this work, have explored GANs as an alternative to conventional upscaling techniques.
Recently, the state of the art computer vision algorithms such as Deep convolutional neural networks have demonstrated their ability to reconstruct high-quality images by learning the low-resolution (LR) to high-resolution (HR) mapping from a high volume of data. The introduction of GANs, and perceptual loss functions in the seminal SRGAN work has enabled to now produce images with details and sharpness indistinguishable from the ground truth.
“The aim of our work at Pixar is to put GANs into production for upscaling.”
The training data for experiments with GANs is collected by rendering 1K-2K pairs of production images using RenderMan, with shots randomly sampled from Coco, Incredibles 2, Toy Story 4 and other Pixar movies. The data augmentation techniques implemented in this work also account for the colour correction in the generated images. Any deviation of colour from the ground truth is immediately verified and corrected using an LR image from the training set to maintain coherence in the film. Know how Pixar movies have changed drastically, thanks to automated light pruning techniques. Check this.
The training set up consists of a PyTorch development environment, a Linux instance with two 24GB NVIDIA Quadro P6000 GPUs. Pixar also has a renderer to synthesise the pairs of images, high-quality scenes with diverse shaders, geometry, and lighting conditions, and a data center fine-tuned to render at tremendous scale.
The researchers claim that they have trained and deployed a production-quality super-resolution model that consistently produces high-quality, artefact-free upscaled images even on scenes with the depth of field or motion blur. Further, the paper states that their latest trained model shows promise towards a pipeline where one can render at 1K and upscale to 2K, which would save 50-75% of the studio’s render farm footprint if used for all intermediate renders.
Speaking at the recently concluded VB Transform conference, one of the researchers, Vaibhav Vavilala, Technical Director at Pixar, said that it typically takes at least 50 CPU hours to render one frame at 2K resolution. Now extrapolate that to 24 frames per second for an hour-long movie.
With 4K and 8K getting more popular, the rendering becomes as many times longer and tedious. Though today’s computers are designed to handle computational intensive processes, there is still a tradeoff as the computational advantages are negated by increasing demands for more creative and super realistic shots. So, in the end, rendering is still a lengthy and expensive process, and there is a large room for innovation.