Listen to this story
|
The new cool kid in town, Diffusion models, has somehow made GANs look obsolete. When it comes to generating images, tools like DALLE-2, Stable Diffusion, and Midjourney have been outperforming the task and have taken over the field completely.
While there are obvious reasons why diffusion models are gaining popularity for image synthesis, general adversarial networks (GANs) saw the same popularity, sparked interest and were revived in 2017, three years after they were proposed by Ian Goodfellow.
GAN uses two neural networks—generator and discriminator—set against each other to generate new and synthesised instances of data, whereas diffusion models are likelihood-based models that offer more stability along with greater quality on image generation tasks.
GAN is more than pretty pictures
Diffusion models were designed to solve the training convergence issue of GANs. Though one can say that GANs are lagging behind in generation tasks, they have been outperforming in various other fields since their inception because of their ability to play adversarial games with learnable loss functions, which is more than just generating pretty pictures.
Just this month, Matthew Baas and Herman Kamper from Stellenbosch University, implemented GAN for unconditional speech synthesis calling it AudioStyleGAN (ASGAN). The model is designed to learn from disentangled latent space without any additional training in a zero-shot fashion. According to the tests, ASGAN outperformed existing diffusion and autoregressive models.
GANs were not developed with the idea of just text-to-image generation. Since they utilise convolutional neural networks (CNNs), there are multiple fields that include computer vision applications like autonomous vehicles, robots, simulation, etc where they have been used for around a decade now. Their unsupervised nature makes them ideal for tasks that rely less on training data, and more on direct real-world applications.
Check out these architectures of GAN that have been ramping up the innovations in the field apart from image generation:
In 2020, researchers from Prague, Al Cairo, and Ireland collaborated and applied GAN techniques to further innovations in autonomous driving. Their paper, titled ‘Yes, we GAN’, iterates application of GAN in different aspects of autonomous driving with different types of experiments. The researchers wanted to tackle the problem of cameras getting soiled because of weather conditions by either water droplets or mud etc. By using two images of cleaned and uncleaned camera vision, the model was able to recognise how to clean the vision when the vision is obstructed.
GAN is a treasure
Though GAN has been researched and implemented extensively since 2014, researchers have only got hold of the low-hanging fruits. People on a Reddit thread have been arguing it’s the same situation with diffusion models at the moment.
When people got their hands on diffusion models, the issue of GANs being unstable was the first they expressed. Though the process of generating images can be smoother and more stable with diffusion models, GANs are way quicker, and with recent developments in hardware, are getting stable as well.
Another paper released last week by researchers at Microsoft Azure AI and University of Texas, proposed Diffusion-GAN. The framework utilises a forward diffusion chain that generates gaussian-mixture noise for GAN training. The approach enabled domain-agnostic differentiable augmentation leveraging the advantages of diffusion without the reverse-diffusion chain. The result was generation of photo-realistic, high-fidelity images that outperformed the existing GAN-based models.
It was zillions of GAN papers and now it's zillions of diffusion models based papers.
— Timnit Gebru (@timnitGebru) September 29, 2022
While diffusion models can be more faithful to the input data, they are slower and lower in fidelity when compared to GAN, which was accepted by the researchers who said that Diffusion Models beat GANs on image synthesis.
The concept of diffusion theory has been around since the 1980s, but the recent trends in image generation which were started by GAN, have made it come to the forefront. After these recent innovations, diffusion models are yet to be tested on areas beyond image generation or text-to-image synthesis and can, to some extent, be applied to any generative tasks.