GauGAN Can Turn MS Paint-Like Sketches Into Photorealistic Masterpieces

Everyone has spent a lot of time in MS Paint when first learning to use a computer, creating basic pieces of art using simple tools. Now, NVIDIA has come up with a demonstration of their latest research, which can create a simplistic image into a photorealistic landscape.

Known as GauGAN, this neural network has an interface to input what the user wants to see in a very basic sketch. This sketch is then converted to a photorealistic landscape using generative adversarial networks.

How does this technology work, and what is the research behind it?


Sign up for your weekly dose of what's up in emerging technology.

An Introduction to GANs

GAN stands for Generative Adversarial Networks, and are a type of neural network that first came to prominence in 2014. Today, the rise of GAN has enabled unsupervised ML to come to the forefront of AI research.

The basic principle of GANs is that they work on generating and discriminating inputs from the user. This means that the neural network is made up of two parts: the generator and the discriminator.

The generator and discriminator compete against each other, with both components checking each other and improving the network as a whole. The generator creates a stream of data determined by its architecture, the discriminator determines whether an input is accurate or not.

While being trained at the same time, both of them benefit from the process. The generator benefits when the discriminator does not recognise the true and generated content, while the latter benefits from correctly classifying inputs.

The end result is a network that creates outputs that are very similar to the dataset, except that it can be manipulated and changed to suit the user’s needs. This has previously been employed in deep fakes, with NVIDIA picking up the technology to create photorealistic faces and, now, landscapes.

NVIDIA GauGAN: Behind The Scenes

The abstract of the GauGAN paper, titled “Semantic Image Synthesis with Spatially-Adaptive Normalization”, states:

‘We  propose  spatially-adaptive  normalization, a simple  but effective layer for synthesizing  photorealistic images given an input semantic  layout.’

The input given by the user is the semantic layout, which mentions the position and abundance of any elements on the screen. Users can pick from a variety of patterns to assign to the image, including skies, clouds, snow, mountains, bushes, shrubs and more. Upon expressing the semantic to the network, it begins its work.

The authors of the paper discovered that the traditional fashion methods for introducing semantics did not work in neural networks. This process, known as normalization, removes most of the semantic information before processing by the neural network. This drove the researchers to create a new method of normalization for use with this network.

To overcome this problem, the researchers came up with a conditional normalization method known as SPatially-Adaptive(DE) normalization. This method influences the activation of neural networks in such a way that the input semantic layer is retained.

This is done through an adaptive, trained transformation, which allows for the clear communication of the semantic through the neural network. The SPADE model can be trained to work on a diverse range of labels.

This means that it can also be trained with an existing dataset that has semantic segmentation data. This allows for reverse mapping from semantic maps for photos, allowing for the photorealistic generation of landscapes as seen in the GauGAN application.

How The GauGAN Webapp Works

The GauGAN web app was trained on landscape images scraped from Flickr, allowing the model to create photo-realistic landscapes through simple user inputs. Users can pick from a list of ‘colours’ such as the sky, trees, clouds, mountains and more to draw a basic outline of the image.

This will then be filled in by the neural network, creating a believable landscape with user elements mostly being preserved to a high extent. Post creating the landscape, users can also change the lighting and background scenery, allowing for a truly customizable experience.

Tools like these show us the capabilities of modern general-like AI such as GANs. It is left to see where technology can take us in the near future.


More Great AIM Stories

Anirudh VK
I am an AI enthusiast and love keeping up with the latest events in the space. I love video games and pizza.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM