Behind NVIDIA’s latest image editing tool called EditGAN

EditGAN allows users to edit desired images with simple commands like drawing without compromising the original imag

NVIDIA, the University of Toronto, and MIT researchers have collaborated to bring out EditGAN that allows users to edit desired images with simple commands like drawing without compromising the original image quality. We know that image editing tools working with AI algorithms have already been in existence for quite some time now, and bringing about improvements in it has been an area of focus for researchers as well.

GANs act as building blocks

Using generative adversarial networks (GANs) that embed images into the GAN’s latent space or work directly with GAN-generated images has been a promising area of development in building image editing algorithms. EditGAN builds of DatasetGAN which is an automatic procedure to generate massive datasets of high-quality semantically segmented images that need minimal human effort.

Last year, we saw a big chunk of back to back releases in the large language model space focusing on the large text-to-image generation models niche. Just a few months back, NVIDIA had come up with the sequel to its GauGAN model, the GauGAN2. It allows users to create real landscape photos by converting words to photographic-quality images that one can then alter.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

As evident from the name, the GAN framework forms the foundation of GauGAN2 as well. NVIDIA said that GauGAN2 combined multiple modalities such as text, semantic segmentation, sketch and style within a single GAN framework. This was the pathway to turn an artist’s vision into a high-quality AI-generated image.

What is EditGAN

As per the paper titled, “EditGAN: High-Precision Semantic Image Editing“, “EditGAN builds on a recently proposed GAN that jointly models both images and their semantic segmentations based on the same underlying latent code and requires as few as 16 labelled examples”.


Download our Mobile App



The team said that they modified the segmentation mask as per the desired edit and optimised the latent code such that it is consistent with the new segmentation. This effectively changes the RGB image. 

As per the paper, the team applied EditGAN on a diverse range of images such as cars, cats, birds, and human faces. They performed quantitative comparisons to multiple baselines and outperformed them in various metrics such as identity preservation, quality preservation, and target attribute accuracy.

Image: EditGAN: High-Precision Semantic Image Editing

What makes EditGAN unique is that it offers very high precision editing but requires little annotated training data and also does not depend on classifiers. It can be run interactively in real-time and allows for straightforward compositionality of multiple edits.

Three different modes

The team also added that EditGAN can be performed in three different modes: 

Real-time editing with editing vectors

This is suitable for localised, well-disentangled edits. It is performed by applying previously learnt editing vectors with varying scales and manipulating images at interactive rates. 

Vector-based editing with self-supervised refinement

This works for localised images that are not perfectly disentangled with other parts of the image. Here, the editing artefacts can be removed by additional optimisation at test time and initialising the edit using the learnt editing vectors.

Optimisation-based editing

This method works for image-specific results and also edits that are large in size that do not transfer to other images through editing vectors.

Limitations 

Though EditGAN is a big step towards using GANs in image editing, the paper also mentions the challenges that come with this innovation. 

It works on images that can be modelled by the GAN. It can be challenging to apply it to photos of vivid city scenes. The team also encountered challenging edits that needed iterative optimisation on each example. To move forward, more research is needed in speeding up the optimisation for such types of edits and building improved generative models with more disentangled latent spaces. 

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at sreejani.bhattacharyya@analyticsindiamag.com

Our Upcoming Events

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023

21 Jul, 2023 | New York
MachineCon USA 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR