MITB Banner

Behind NVIDIA’s latest image editing tool called EditGAN

EditGAN allows users to edit desired images with simple commands like drawing without compromising the original imag

Share

NVIDIA, the University of Toronto, and MIT researchers have collaborated to bring out EditGAN that allows users to edit desired images with simple commands like drawing without compromising the original image quality. We know that image editing tools working with AI algorithms have already been in existence for quite some time now, and bringing about improvements in it has been an area of focus for researchers as well.

GANs act as building blocks

Using generative adversarial networks (GANs) that embed images into the GAN’s latent space or work directly with GAN-generated images has been a promising area of development in building image editing algorithms. EditGAN builds of DatasetGAN which is an automatic procedure to generate massive datasets of high-quality semantically segmented images that need minimal human effort.

Last year, we saw a big chunk of back to back releases in the large language model space focusing on the large text-to-image generation models niche. Just a few months back, NVIDIA had come up with the sequel to its GauGAN model, the GauGAN2. It allows users to create real landscape photos by converting words to photographic-quality images that one can then alter.

As evident from the name, the GAN framework forms the foundation of GauGAN2 as well. NVIDIA said that GauGAN2 combined multiple modalities such as text, semantic segmentation, sketch and style within a single GAN framework. This was the pathway to turn an artist’s vision into a high-quality AI-generated image.

What is EditGAN

As per the paper titled, “EditGAN: High-Precision Semantic Image Editing“, “EditGAN builds on a recently proposed GAN that jointly models both images and their semantic segmentations based on the same underlying latent code and requires as few as 16 labelled examples”.

The team said that they modified the segmentation mask as per the desired edit and optimised the latent code such that it is consistent with the new segmentation. This effectively changes the RGB image. 

As per the paper, the team applied EditGAN on a diverse range of images such as cars, cats, birds, and human faces. They performed quantitative comparisons to multiple baselines and outperformed them in various metrics such as identity preservation, quality preservation, and target attribute accuracy.

Image: EditGAN: High-Precision Semantic Image Editing

What makes EditGAN unique is that it offers very high precision editing but requires little annotated training data and also does not depend on classifiers. It can be run interactively in real-time and allows for straightforward compositionality of multiple edits.

Three different modes

The team also added that EditGAN can be performed in three different modes: 

Real-time editing with editing vectors

This is suitable for localised, well-disentangled edits. It is performed by applying previously learnt editing vectors with varying scales and manipulating images at interactive rates. 

Vector-based editing with self-supervised refinement

This works for localised images that are not perfectly disentangled with other parts of the image. Here, the editing artefacts can be removed by additional optimisation at test time and initialising the edit using the learnt editing vectors.

Optimisation-based editing

This method works for image-specific results and also edits that are large in size that do not transfer to other images through editing vectors.

Limitations 

Though EditGAN is a big step towards using GANs in image editing, the paper also mentions the challenges that come with this innovation. 

It works on images that can be modelled by the GAN. It can be challenging to apply it to photos of vivid city scenes. The team also encountered challenging edits that needed iterative optimisation on each example. To move forward, more research is needed in speeding up the optimisation for such types of edits and building improved generative models with more disentangled latent spaces. 

Share
Picture of Sreejani Bhattacharyya

Sreejani Bhattacharyya

I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at sreejani.bhattacharyya@analyticsindiamag.com
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.