Diffusion Models Vs GANs: Which one to choose for Image Synthesis

Both of them have found wide usage in the field of image, video and voice generation, leading to a debate on what produces better results—diffusion models or GANs.

Image synthesis tasks are performed generally by deep generative models like GANs, VAEs, and autoregressive models. Generative adversarial networks (GANs) have been a research area of much focus in the last few years due to the quality of output they produce. Another interesting area of research that has found a place are diffusion models. Both of them have found wide usage in the field of image, video and voice generation. Naturally, this has led to an ongoing debate on what produces better results—diffusion models or GANs.

GAN is an algorithmic architecture that uses two neural networks that are set one against the other to generate newly synthesised instances of data that can pass for real data. Diffusion models have become increasingly popular as they provide training stability as well as quality results on image and audio generation.

How does a diffusion model work?

Google explains how diffusion models work. They work by corrupting the training data by progressively adding Gaussian noise. This removes details in the data till it becomes pure noise. Then, it trains a neural network to reverse the corruption process. “Running this reversed corruption process synthesises data from pure noise by gradually denoising it until a clean sample is produced,” adds Google.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

GAN architecture

GANs have two parts:

  • Generator: It learns to generate plausible data. 
  • Discriminator: The discriminator decides whether or not each instance of data that it reviews belongs to the actual training dataset. It also penalises the generator for producing implausible results.

The generator, as well as the discriminator, are neural networks. The generator output is directly connected to the discriminator output. With the process of backpropagation, the discriminator’s classification gives a signal that the generator uses to update its weights.


Download our Mobile App



Image: Google 

When the generator training goes well, we can see that the discriminator will get worse at differentiating between real and fake data. This leads to a reduction in accuracy.

Some common issues with GANs

Though GANs form the framework for image synthesis in a vast section of models, they do come with some disadvantages that researchers are actively working on. Some of these, as pointed out by Google, are:

  • Vanishing gradients: If the discriminator is too good, the generator training can fail due to the issue of vanishing gradients. 
  • Mode collapse: If a generator produces an especially plausible output, it can learn to produce only that output. If this happens, the discriminator’s best strategy is to learn to always reject that output. Google adds, “But if the next generation of discriminator gets stuck in a local minimum and doesn’t find the best strategy, then it’s too easy for the next generator iteration to find the most plausible output for the current discriminator.”
  • Failure to converge: GANs also have this frequent issue to converge.

Diffusion models to the rescue

OpenAI

A paper titled ‘Diffusion Models Beat GANs on Image Synthesis’ by OpenAI researchers has shown that diffusion models can achieve image sample quality superior to the generative models but come with some limitations.

The paper said that the team could achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, the team improved sample quality with classifier guidance.

The team also said that they think that the gap between diffusion models and GANs come from two factors: 

“The model architectures used by recent GAN literature have been heavily explored. GANs are able to trade off diversity for fidelity, producing high-quality samples but not covering the whole distribution,” the paper added.

Google AI

Last year, Google AI introduced two connected approaches named Super-Resolution via Repeated Refinements (SR3) and Cascaded Diffusion Models (CDM) to improve the image synthesis quality for diffusion models. The team said that by scaling up diffusion models and with carefully selected data augmentation techniques, they could outperform existing approaches. SR3 attained strong image super-resolution results that surpass GANs in human evaluations, while CDM generated high fidelity ImageNet samples that surpassed BigGAN-deep and VQ-VAE2 on both FID score and Classification Accuracy Score by a large margin.

DiffWave

DiffWave is a probabilistic model for conditional and unconditional waveform generation. The paper said that DiffWave produces high-fidelity audios in different waveform generation tasks. It includes neural vocoding conditioned on Mel spectrogram, class-conditional generation, and unconditional generation. Results showed that it significantly outperforms autoregressive and GAN-based waveform models in the unconditional generation task in areas of audio quality and sample diversity from various automatic and human evaluations.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at sreejani.bhattacharyya@analyticsindiamag.com

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Council Post: The Rise of Generative AI and Living Content

In this era of content, the use of technology, such as AI and data analytics, is becoming increasingly important as it can help content creators personalise their content, improve its quality, and reach their target audience with greater efficacy. AI writing has arrived and is here to stay. Once we overcome the initial need to cling to our conventional methods, we can begin to be more receptive to the tremendous opportunities that these technologies present.