Listen to this story
In 2021, OpenAI chief Sam Altman wrote a blog discussing how Moore’s Law—the theory that semiconductor chips would become twice as powerful for the same price around every two years—should be applicable for everything. Altman tweeted about the leapfrogs that AI was making, saying, ”A new version of Moore’s law that could start soon: the amount of intelligence in the universe doubles every 18 months”.
Introducing consistency models
To others, Altman’s optimism may seem unwarranted but OpenAI’s pace of research seems to back up his claims. Last week, the startup published a paper discussing a new class of generative models titled, ‘Consistency Models’, that outperformed diffusion models. Authored by Yang Song, Prafulla Dhariwal, Mark Chen and OpenAI co-founder Ilya Sutskever, the study was released on March 3, 2023.
Diffusion models have become the foundation of the revolution in generative AI since they took over GANs as the most effective models for image synthesis. Some of the most prominent text-to-image AI generators such as OpenAI’s DALL.E 2, Stability AI’s Stable Diffusion and Google’s Imagen are all diffusion models.
Faster and less energy-intensive than Diffusion models
However, consistency models have proven to produce the same quality output as diffusion models in much less time. This is because the consistency model works on a single-step generation process like GANs.
Diffusion models, in contrast, work around a repetitive sampling process which progressively removes noise from an image. The continuous iterative generation process of diffusion models eats up 10–2000 times more compute in comparison to consistency models and slows down the inference during training.
Consistency models are able to trade-off compute for sample quality when necessary. Besides this, such models are also capable of performing zeroshot data editing tasks like image inpainting, colorisation or stroke-guided image editing.
These models also use a mathematical equation to transform data into noise and ensure that the resulting output is consistent for similar data points, allowing for smooth transitions between them. Such equations are called probability flow ordinary differential equations. The study has named this class of models ‘consistency’ because they maintain this property of self-consistency throughout between the input data and the output.
These models can either be trained in the distillation mode or the isolation mode. In the distillation mode, consistency models are able to distill the data from pre-trained diffusion models into a sampler that can perform in a single step. While in isolation mode, consistency models don’t depend on diffusion models at all, thereby making them an entirely independent type of models.
No adversarial training, no problem
Both methods of training however have removed adversarial training from their books. Adversarial training does result in a stronger neural network but goes about the process in a roundabout way—it introduces a wrongly classified set of adversarial examples and then retrains the target neural network with the correct labels.
Consequently, adversarial training has been also found to lead to a slight decrease in the accuracy in predictions by deep learning models. They can also cause unexpected side effects in robotics applications.
The experiments showed that the distillation techniques used in training consistency models were better than the distillation techniques used in diffusion models. Consistency models achieved a new state-of-the-art Frechet Inception Distance score—which is indicative of the quality of AI generated images—of 3.55 on the CIFAR10 image dataset and 6.20 on the ImageNet 64*64 dataset.
It’s fair to say that OpenAI isn’t the only stakeholder here but is definitely one of the major ones. If they want their AI tools to sell more, the onus falls on them to ensure that they take less time and use less compute. In that sense, the potential impact of consistency models is huge since diffusion models aren’t only popular in image generation but also in video and audio generation models.
Just last month, Sutskever posted a tweet with a hint, saying, “Many believe that great AI advances must contain a new ‘idea’. But it is not so: many of AI’s greatest advances had the form huh, turns out this familiar unimportant idea, when done right, is downright incredible”. This paper shows exactly that—built on older concepts with a tweak can change everything.