Active Hackathon

In the world of DALL-E 2 and Midjourney, enters open-source Disco Diffusion

Google's latest research leaps toward resolving the diffusion models' image resolution issue through linking SR3 and CDM.
In the world of DALL-E 2 and Midjourey, enters open-source Disco Diffusion
Listen to this story

Of late, AI artists and artwork have been mushrooming rapidly. Platforms like Ultraleap-backed, ‘Midjourney’, OpenAI’s, ‘DALL-E 2’, Meta’s, ‘Make-A-Scene’, Hugging Face’s, ‘DALL-E Mini’ (now ‘Craiyon🖍’) and others are redefining the imagination of design and visualisation as we know it. However, the majority of these platforms provide access to users on an invite-only basis. 

A free open source softwares (FOSS) that recently gained popularity is ‘Disco Diffusion’, a CLIP-Guided Diffusion model that can be used to convert text-to-image using a compilation of words called ‘prompts’, and having it search databases to interpret the look. The latest version (v5.6) comes with an additional feature of portrait generator.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Created by Somnai and augmented by Gandamu, the new generative adversarial network (GAN) code is hosted on Google Colab Notebook. The model is as flexible as VQGAN ImageNET and WikiArt models in creating vibrant pieces. 

The diffusion model is a model of the cognitive processes involved in simple two-choice decisions. It is the process of removing noise from an image for better resolution.

First proposed in 2015, a renewed interest in diffusion models was observed recently, owing to their training stability and promising sample quality results on audio and visual generation. They offer potentially favourable results compared to other deep generative models.

Diffusion models work by altering the training data with the addition of Gaussian noise, gradually removing the details in the data set till it becomes pure noise, and then training a neural network to reverse this corruption process. Running this reversed corruption process synthesises data from pure noise by slowly reducing noise to produce a clean sample. 

The process can be interpreted as an ‘optimisation algorithm’ that follows the gradient of the data density to produce likely samples.

Latest update

Google’s latest research leaps toward resolving the diffusion models’ image resolution issue through linking SR3 and CDM. Adding a unique data set and widening the model now helps produce better results compared to the existing models.

The SR3 is a super-resolution diffusion model which takes low-resolution as input and constructs a corresponding high-resolution image from complete noise. It uses the image destruction process for training.

CDM is a type-condition diffusion model trained using ImageNet data to create high-resolution images. As ImageNet is a highly complex data set, researchers concatenate multiple diffusion models to build CDM.

The researchers mentioned that this method could link multiple generative models that span several spatial resolutions together and then generate a diffusion model of low-resolution data, followed by a series of SR3 high-resolution diffusion models.

The realistic samples generated by CDM are used to evaluate the Fréchet Inception Distance (FID) score and classification accuracy score of the image quality created by the developed model.

Overall, the ultra-high-resolution images generated by SR3 surpass GAN in human evaluation. Moreover, both greatly exceed the current top methods, BigGAN-deep and VQ-VAE-2.

With SR3 and CDM, the performance of diffusion models has been pushed to the state-of-the-art on super-resolution and class-conditional ImageNet generation benchmarks.

The process of creating paintings by ‘Disco Diffusion’ can be broadly divided into the following steps:

  • Open the program
  • Set parameters such as the image size, the number of process maps, and the number of generated images
  • Write crisp prompts in English, start running and then wait for the AI to calculate and produce the painting

The generated pieces can be located in the user’s ‘Google Drive’. 

Not just pictures

YouTube creator ‘DoodleChaos’ created a full-length music video using Disco Diffusion V5.2 Turbo. 

In the description, he explains that he added keyframes for camera motion throughout the generated motion picture and manually synchronised it to the beat. 

Furthermore, he specified the changes to the art style at different song moments. Since many of the lyrics are non-specific, even a human illustrator would have difficulty representing it visually. To make the lyrics more comprehensible for the AI, he modified them to be more programme coherent, such as specifying a setting. 

Useful resources for Diffusion models

Zippy’s Disco Diffusion Cheatsheet v0.3 presents every setting for Disco Diffusion in layman’s terms.

Disco Diffusion Modifiers by weirdwonderfulai.art consists of modifiers, like artist names, which are keywords that guide the image generation in a certain direction.

Disco Diffusion 70+ Artist Studies also by weirdwonderfulai.art has centralised samples of generated art for 600+ artists. These contributions were made by many others experimenting with generating art and submitting their finds.

Development in the domain

Meta’s recent AI concept, ‘Make-a-scene’ generates imagery using text plus simple sketching. 

“Make-A-Scene empowers people to create images using text prompts and freeform sketches. Prior image-generating AI systems typically used text descriptions as input, but the results could be difficult to predict”, according to Meta.

More Great AIM Stories

Tasmia Ansari
Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022

[class^="wpforms-"]
[class^="wpforms-"]