GauGAN2 From NVIDIA Includes A Text-To-Image Feature

New research from NVIDIA shows that the GauGAN AI art demo now responds to words.

NVIDIA’s new text-to-image tool demonstrates how artificial intelligence (AI) could eventually replace stock photography. NVIDIA unveiled the sequel to its GauGAN model, the GauGAN2, which allows users to create real landscape photos. GauGAN2 can convert words to photographic-quality images that one can then alter.

GauGAN’s deep learning model enables anyone to turn their ideas into photorealistic artworks. Enter a phrase such as “sunset at a beach,” and the AI will generate the scene in real-time. By adding a new adjective such as “sunset on a rocky beach,” or by changing “sunset” to “afternoon” or “rainy day,” the model, which is based on generative adversarial networks, instantly transforms the image. The NVIDIA GauGAN2 neural network, trained on 10 million nature photographs, produces realistic images based on a user’s description. After that, users can add new elements to the picture by hand-sketching them.

Users can build a segmentation map, a high-level outline of the scene’s items, with the click of a button. They can then switch to drawing, fine-tuning the picture with rough sketches labelled sky, tree, rock, and river, allowing the intelligent paintbrush to merge these doodles into breathtaking masterpieces.

GauGAN2 Features

GauGAN2 is one of the first demonstrations to integrate various modalities – text, semantic segmentation, sketch, and style — under a single GAN framework. It accelerates and simplifies the process of converting an artist’s vision into a high-quality AI-generated image. For example, users can enter a simple word to generate the image’s primary features and subject, such as a snow-capped mountain range, rather than sketching every detail of an imagined landscape. This starting point can then be adjusted using sketches to increase a mountain’s height or clouds in the sky. It is not limited to creating realistic images; artists can also utilise it to create strange settings. 

Training

The researchers trained GauGAN2’s AI model on tens of millions of high-quality landscape photographs on the NVIDIA Selene supercomputer. NVIDIA Selene is an NVIDIA DGX SuperPOD system that ranks among the world’s top ten supercomputers. Next, the researchers utilised a neural network to associate words with visual representations such as “winter,” “foggy,” or “rainbow.” Compared to state-of-the-art models developed expressly for text-to-image or segmentation map-to-image applications, GauGAN2’s neural network generates various higher quality images.

Availability

The new GauGAN2 text-to-image capability is now available on NVIDIA AI Demos, a site where users may experience AI through the newest NVIDIA Research demos. GauGAN2 enables users to build and customise scenarios more quickly and precisely with text prompts and sketches. In addition, GauGAN2 is a strong tool for creating photorealistic art using a combination of words and drawings since it integrates segmentation mapping, inpainting, and text-to-image production in a single model. The GauGAN2 model is powered by generative adversarial networks (GANs). 

Conclusion

GauGAN2 demonstrates the future potential for strong image-generation tools for artists. One such application is the NVIDIA Canvas, built on GauGAN technology and is freely available to anyone with an NVIDIA RTX GPU. Deep learning models used in GauGAN2 transform a written phrase or sentence into a photorealistic artwork. Thanks to GauGAN2, the latest iteration of NVIDIA Research’s AI painting demo, creating a thousand-word image takes just three or four sentences. 

To know about the project, see here.

More Great AIM Stories

Dr. Nivash Jeevanandam
Nivash holds a doctorate in information technology and has been a research associate at a university and a development engineer in the IT industry. Data science and machine learning excite him.

More Stories

OUR UPCOMING EVENTS

8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

MORE FROM AIM
Yugesh Verma
All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges

Yugesh Verma
A beginner’s guide to Spatio-Temporal graph neural networks

Spatio-temporal graphs are made of static structures and time-varying features, and such information in a graph requires a neural network that can deal with time-varying features of the graph. Neural networks which are developed to deal with time-varying features of the graph can be considered as Spatio-temporal graph neural networks. 

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM