NVIDIA, the University of Toronto, and MIT researchers have collaborated to bring out EditGAN that allows users to edit desired images with simple commands like drawing without compromising the original image quality. We know that image editing tools working with AI algorithms have already been in existence for quite some time now, and bringing about improvements in it has been an area of focus for researchers as well.
GANs act as building blocks
Using generative adversarial networks (GANs) that embed images into the GAN’s latent space or work directly with GAN-generated images has been a promising area of development in building image editing algorithms. EditGAN builds of DatasetGAN which is an automatic procedure to generate massive datasets of high-quality semantically segmented images that need minimal human effort.
Last year, we saw a big chunk of back to back releases in the large language model space focusing on the large text-to-image generation models niche. Just a few months back, NVIDIA had come up with the sequel to its GauGAN model, the GauGAN2. It allows users to create real landscape photos by converting words to photographic-quality images that one can then alter.
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
As evident from the name, the GAN framework forms the foundation of GauGAN2 as well. NVIDIA said that GauGAN2 combined multiple modalities such as text, semantic segmentation, sketch and style within a single GAN framework. This was the pathway to turn an artist’s vision into a high-quality AI-generated image.
What is EditGAN
As per the paper titled, “EditGAN: High-Precision Semantic Image Editing“, “EditGAN builds on a recently proposed GAN that jointly models both images and their semantic segmentations based on the same underlying latent code and requires as few as 16 labelled examples”.
Download our Mobile App
The team said that they modified the segmentation mask as per the desired edit and optimised the latent code such that it is consistent with the new segmentation. This effectively changes the RGB image.
As per the paper, the team applied EditGAN on a diverse range of images such as cars, cats, birds, and human faces. They performed quantitative comparisons to multiple baselines and outperformed them in various metrics such as identity preservation, quality preservation, and target attribute accuracy.
Image: EditGAN: High-Precision Semantic Image Editing
What makes EditGAN unique is that it offers very high precision editing but requires little annotated training data and also does not depend on classifiers. It can be run interactively in real-time and allows for straightforward compositionality of multiple edits.
Three different modes
The team also added that EditGAN can be performed in three different modes:
Real-time editing with editing vectors
This is suitable for localised, well-disentangled edits. It is performed by applying previously learnt editing vectors with varying scales and manipulating images at interactive rates.
Vector-based editing with self-supervised refinement
This works for localised images that are not perfectly disentangled with other parts of the image. Here, the editing artefacts can be removed by additional optimisation at test time and initialising the edit using the learnt editing vectors.
Optimisation-based editing
This method works for image-specific results and also edits that are large in size that do not transfer to other images through editing vectors.
Limitations
Though EditGAN is a big step towards using GANs in image editing, the paper also mentions the challenges that come with this innovation.
It works on images that can be modelled by the GAN. It can be challenging to apply it to photos of vivid city scenes. The team also encountered challenging edits that needed iterative optimisation on each example. To move forward, more research is needed in speeding up the optimisation for such types of edits and building improved generative models with more disentangled latent spaces.