MITB Banner

GANs in The Age of Diffusion Models

Though one can say that GANs are lagging in generation tasks, they have been outperforming in various other fields since inception

Share

Listen to this story

The new cool kid in town, Diffusion models, has somehow made GANs look obsolete. When it comes to generating images, tools like DALLE-2, Stable Diffusion, and Midjourney have been outperforming the task and have taken over the field completely.

While there are obvious reasons why diffusion models are gaining popularity for image synthesis, general adversarial networks (GANs) saw the same popularity, sparked interest and were revived in 2017, three years after they were proposed by Ian Goodfellow.

GAN uses two neural networks—generator and discriminator—set against each other to generate new and synthesised instances of data, whereas diffusion models are likelihood-based models that offer more stability along with greater quality on image generation tasks.

https://twitter.com/tomgoldsteincs/status/1560334207578161152

GAN is more than pretty pictures

Diffusion models were designed to solve the training convergence issue of GANs. Though one can say that GANs are lagging behind in generation tasks, they have been outperforming in various other fields since their inception because of their ability to play adversarial games with learnable loss functions, which is more than just generating pretty pictures.

Just this month, Matthew Baas and Herman Kamper from Stellenbosch University, implemented GAN for unconditional speech synthesis calling it AudioStyleGAN (ASGAN). The model is designed to learn from disentangled latent space without any additional training in a zero-shot fashion. According to the tests, ASGAN outperformed existing diffusion and autoregressive models. 

GANs were not developed with the idea of just text-to-image generation. Since they utilise convolutional neural networks (CNNs), there are multiple fields that include computer vision applications like autonomous vehicles, robots, simulation, etc where they have been used for around a decade now. Their unsupervised nature makes them ideal for tasks that rely less on training data, and more on direct real-world applications.

Check out these architectures of GAN that have been ramping up the innovations in the field apart from image generation:

  1. CycleGAN
  2. pixelRNN
  3. DiscoGAN
  4. IsGAN
  5. StyleGAN

In 2020, researchers from Prague, Al Cairo, and Ireland collaborated and applied GAN techniques to further innovations in autonomous driving. Their paper, titled ‘Yes, we GAN’, iterates application of GAN in different aspects of autonomous driving with different types of experiments. The researchers wanted to tackle the problem of cameras getting soiled because of weather conditions by either water droplets or mud etc. By using two images of cleaned and uncleaned camera vision, the model was able to recognise how to clean the vision when the vision is obstructed.

GAN is a treasure

Though GAN has been researched and implemented extensively since 2014, researchers have only got hold of the low-hanging fruits. People on a Reddit thread have been arguing it’s the same situation with diffusion models at the moment.

When people got their hands on diffusion models, the issue of GANs being unstable was the first they expressed. Though the process of generating images can be smoother and more stable with diffusion models, GANs are way quicker, and with recent developments in hardware, are getting stable as well.

Another paper released last week by researchers at Microsoft Azure AI and University of Texas, proposed Diffusion-GAN. The framework utilises a forward diffusion chain that generates gaussian-mixture noise for GAN training. The approach enabled domain-agnostic differentiable augmentation leveraging the advantages of diffusion without the reverse-diffusion chain. The result was generation of photo-realistic, high-fidelity images that outperformed the existing GAN-based models.

While diffusion models can be more faithful to the input data, they are slower and lower in fidelity when compared to GAN, which was accepted by the researchers who said that Diffusion Models beat GANs on image synthesis

The concept of diffusion theory has been around since the 1980s, but the recent trends in image generation which were started by GAN, have made it come to the forefront. After these recent innovations, diffusion models are yet to be tested on areas beyond image generation or text-to-image synthesis and can, to some extent, be applied to any generative tasks.

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.