Everyone has spent a lot of time in MS Paint when first learning to use a computer, creating basic pieces of art using simple tools. Now, NVIDIA has come up with a demonstration of their latest research, which can create a simplistic image into a photorealistic landscape.
Known as GauGAN, this neural network has an interface to input what the user wants to see in a very basic sketch. This sketch is then converted to a photorealistic landscape using generative adversarial networks.
How does this technology work, and what is the research behind it?
Sign up for your weekly dose of what's up in emerging technology.
An Introduction to GANs
GAN stands for Generative Adversarial Networks, and are a type of neural network that first came to prominence in 2014. Today, the rise of GAN has enabled unsupervised ML to come to the forefront of AI research.
The basic principle of GANs is that they work on generating and discriminating inputs from the user. This means that the neural network is made up of two parts: the generator and the discriminator.
The generator and discriminator compete against each other, with both components checking each other and improving the network as a whole. The generator creates a stream of data determined by its architecture, the discriminator determines whether an input is accurate or not.
While being trained at the same time, both of them benefit from the process. The generator benefits when the discriminator does not recognise the true and generated content, while the latter benefits from correctly classifying inputs.
The end result is a network that creates outputs that are very similar to the dataset, except that it can be manipulated and changed to suit the user’s needs. This has previously been employed in deep fakes, with NVIDIA picking up the technology to create photorealistic faces and, now, landscapes.
NVIDIA GauGAN: Behind The Scenes
The abstract of the GauGAN paper, titled “Semantic Image Synthesis with Spatially-Adaptive Normalization”, states:
‘We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout.’
The input given by the user is the semantic layout, which mentions the position and abundance of any elements on the screen. Users can pick from a variety of patterns to assign to the image, including skies, clouds, snow, mountains, bushes, shrubs and more. Upon expressing the semantic to the network, it begins its work.
The authors of the paper discovered that the traditional fashion methods for introducing semantics did not work in neural networks. This process, known as normalization, removes most of the semantic information before processing by the neural network. This drove the researchers to create a new method of normalization for use with this network.
To overcome this problem, the researchers came up with a conditional normalization method known as SPatially-Adaptive(DE) normalization. This method influences the activation of neural networks in such a way that the input semantic layer is retained.
This is done through an adaptive, trained transformation, which allows for the clear communication of the semantic through the neural network. The SPADE model can be trained to work on a diverse range of labels.
This means that it can also be trained with an existing dataset that has semantic segmentation data. This allows for reverse mapping from semantic maps for photos, allowing for the photorealistic generation of landscapes as seen in the GauGAN application.
How The GauGAN Webapp Works
The GauGAN web app was trained on landscape images scraped from Flickr, allowing the model to create photo-realistic landscapes through simple user inputs. Users can pick from a list of ‘colours’ such as the sky, trees, clouds, mountains and more to draw a basic outline of the image.
This will then be filled in by the neural network, creating a believable landscape with user elements mostly being preserved to a high extent. Post creating the landscape, users can also change the lighting and background scenery, allowing for a truly customizable experience.
Tools like these show us the capabilities of modern general-like AI such as GANs. It is left to see where technology can take us in the near future.