GauGAN2 From NVIDIA Includes A Text-To-Image Feature

NVIDIA’s new text-to-image tool demonstrates how artificial intelligence (AI) could eventually replace stock photography. NVIDIA unveiled the sequel to its GauGAN model, the GauGAN2, which allows users to create real landscape photos. GauGAN2 can convert words to photographic-quality images that one can then alter.

GauGAN’s deep learning model enables anyone to turn their ideas into photorealistic artworks. Enter a phrase such as “sunset at a beach,” and the AI will generate the scene in real-time. By adding a new adjective such as “sunset on a rocky beach,” or by changing “sunset” to “afternoon” or “rainy day,” the model, which is based on generative adversarial networks, instantly transforms the image. The NVIDIA GauGAN2 neural network, trained on 10 million nature photographs, produces realistic images based on a user’s description. After that, users can add new elements to the picture by hand-sketching them.

Users can build a segmentation map, a high-level outline of the scene’s items, with the click of a button. They can then switch to drawing, fine-tuning the picture with rough sketches labelled sky, tree, rock, and river, allowing the intelligent paintbrush to merge these doodles into breathtaking masterpieces.


Sign up for your weekly dose of what's up in emerging technology.

GauGAN2 Features

GauGAN2 is one of the first demonstrations to integrate various modalities – text, semantic segmentation, sketch, and style — under a single GAN framework. It accelerates and simplifies the process of converting an artist’s vision into a high-quality AI-generated image. For example, users can enter a simple word to generate the image’s primary features and subject, such as a snow-capped mountain range, rather than sketching every detail of an imagined landscape. This starting point can then be adjusted using sketches to increase a mountain’s height or clouds in the sky. It is not limited to creating realistic images; artists can also utilise it to create strange settings. 


The researchers trained GauGAN2’s AI model on tens of millions of high-quality landscape photographs on the NVIDIA Selene supercomputer. NVIDIA Selene is an NVIDIA DGX SuperPOD system that ranks among the world’s top ten supercomputers. Next, the researchers utilised a neural network to associate words with visual representations such as “winter,” “foggy,” or “rainbow.” Compared to state-of-the-art models developed expressly for text-to-image or segmentation map-to-image applications, GauGAN2’s neural network generates various higher quality images.

Download our Mobile App


The new GauGAN2 text-to-image capability is now available on NVIDIA AI Demos, a site where users may experience AI through the newest NVIDIA Research demos. GauGAN2 enables users to build and customise scenarios more quickly and precisely with text prompts and sketches. In addition, GauGAN2 is a strong tool for creating photorealistic art using a combination of words and drawings since it integrates segmentation mapping, inpainting, and text-to-image production in a single model. The GauGAN2 model is powered by generative adversarial networks (GANs). 


GauGAN2 demonstrates the future potential for strong image-generation tools for artists. One such application is the NVIDIA Canvas, built on GauGAN technology and is freely available to anyone with an NVIDIA RTX GPU. Deep learning models used in GauGAN2 transform a written phrase or sentence into a photorealistic artwork. Thanks to GauGAN2, the latest iteration of NVIDIA Research’s AI painting demo, creating a thousand-word image takes just three or four sentences. 

To know about the project, see here.

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Dr. Nivash Jeevanandam
Nivash holds a doctorate in information technology and has been a research associate at a university and a development engineer in the IT industry. Data science and machine learning excite him.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges