NVIDIA’s new text-to-image tool demonstrates how artificial intelligence (AI) could eventually replace stock photography. NVIDIA unveiled the sequel to its GauGAN model, the GauGAN2, which allows users to create real landscape photos. GauGAN2 can convert words to photographic-quality images that one can then alter.
GauGAN’s deep learning model enables anyone to turn their ideas into photorealistic artworks. Enter a phrase such as “sunset at a beach,” and the AI will generate the scene in real-time. By adding a new adjective such as “sunset on a rocky beach,” or by changing “sunset” to “afternoon” or “rainy day,” the model, which is based on generative adversarial networks, instantly transforms the image. The NVIDIA GauGAN2 neural network, trained on 10 million nature photographs, produces realistic images based on a user’s description. After that, users can add new elements to the picture by hand-sketching them.
Users can build a segmentation map, a high-level outline of the scene’s items, with the click of a button. They can then switch to drawing, fine-tuning the picture with rough sketches labelled sky, tree, rock, and river, allowing the intelligent paintbrush to merge these doodles into breathtaking masterpieces.
GauGAN2 is one of the first demonstrations to integrate various modalities – text, semantic segmentation, sketch, and style — under a single GAN framework. It accelerates and simplifies the process of converting an artist’s vision into a high-quality AI-generated image. For example, users can enter a simple word to generate the image’s primary features and subject, such as a snow-capped mountain range, rather than sketching every detail of an imagined landscape. This starting point can then be adjusted using sketches to increase a mountain’s height or clouds in the sky. It is not limited to creating realistic images; artists can also utilise it to create strange settings.
The researchers trained GauGAN2’s AI model on tens of millions of high-quality landscape photographs on the NVIDIA Selene supercomputer. NVIDIA Selene is an NVIDIA DGX SuperPOD system that ranks among the world’s top ten supercomputers. Next, the researchers utilised a neural network to associate words with visual representations such as “winter,” “foggy,” or “rainbow.” Compared to state-of-the-art models developed expressly for text-to-image or segmentation map-to-image applications, GauGAN2’s neural network generates various higher quality images.
The new GauGAN2 text-to-image capability is now available on NVIDIA AI Demos, a site where users may experience AI through the newest NVIDIA Research demos. GauGAN2 enables users to build and customise scenarios more quickly and precisely with text prompts and sketches. In addition, GauGAN2 is a strong tool for creating photorealistic art using a combination of words and drawings since it integrates segmentation mapping, inpainting, and text-to-image production in a single model. The GauGAN2 model is powered by generative adversarial networks (GANs).
GauGAN2 demonstrates the future potential for strong image-generation tools for artists. One such application is the NVIDIA Canvas, built on GauGAN technology and is freely available to anyone with an NVIDIA RTX GPU. Deep learning models used in GauGAN2 transform a written phrase or sentence into a photorealistic artwork. Thanks to GauGAN2, the latest iteration of NVIDIA Research’s AI painting demo, creating a thousand-word image takes just three or four sentences.
To know about the project, see here.