OpenAI Junks Diffusion for Consistency Models

Consistency models, on the other hand, are a single-step generation which is faster and is able to trade-off compute for sample quality when necessary.
Listen to this story

In 2021, OpenAI chief Sam Altman wrote a blog discussing how Moore’s Law—the theory that semiconductor chips would become twice as powerful for the same price around every two years—should be applicable for everything. Altman tweeted about the leapfrogs that AI was making, saying, ”A new version of Moore’s law that could start soon: the amount of intelligence in the universe doubles every 18 months”. 

Introducing consistency models

To others, Altman’s optimism may seem unwarranted but OpenAI’s pace of research seems to back up his claims. Last week, the startup published a paper discussing a new class of generative models titled, ‘Consistency Models’, that outperformed diffusion models. Authored by Yang Song, Prafulla Dhariwal, Mark Chen and OpenAI co-founder Ilya Sutskever, the study was released on March 3, 2023. 

Diffusion models have become the foundation of the revolution in generative AI since they took over GANs as the most effective models for image synthesis. Some of the most prominent text-to-image AI generators such as OpenAI’s DALL.E 2, Stability AI’s Stable Diffusion and Google’s Imagen are all diffusion models. 

Faster and less energy-intensive than Diffusion models

However, consistency models have proven to produce the same quality output as diffusion models in much less time. This is because the consistency model works on a single-step generation process like GANs.

Diffusion models, in contrast, work around a repetitive sampling process which progressively removes noise from an image. The continuous iterative generation process of diffusion models eats up 10–2000 times more compute in comparison to consistency models and slows down the inference during training. 

Consistency models are able to trade-off compute for sample quality when necessary. Besides this, such models are also capable of performing zeroshot data editing tasks like image inpainting, colorisation or stroke-guided image editing. 

Canadian computer scientist Ilya Sutskever, co-founder and chief scientist of OpenAI, Source: University of Toronto

These models also use a mathematical equation to transform data into noise and ensure that the resulting output is consistent for similar data points, allowing for smooth transitions between them. Such equations are called probability flow ordinary differential equations. The study has named this class of models ‘consistency’ because they maintain this property of self-consistency throughout between the input data and the output

These models can either be trained in the distillation mode or the isolation mode. In the distillation mode, consistency models are able to distill the data from pre-trained diffusion models into a sampler that can perform in a single step. While in isolation mode, consistency models don’t depend on diffusion models at all, thereby making them an entirely independent type of models

No adversarial training, no problem

Both methods of training however have removed adversarial training from their books. Adversarial training does result in a stronger neural network but goes about the process in a roundabout way—it introduces a wrongly classified set of adversarial examples and then retrains the target neural network with the correct labels. 

Consequently, adversarial training has been also found to lead to a slight decrease in the accuracy in predictions by deep learning models. They can also cause unexpected side effects in robotics applications.

The experiments showed that the distillation techniques used in training consistency models were better than the distillation techniques used in diffusion models. Consistency models achieved a new state-of-the-art Frechet Inception Distance score—which is indicative of the quality of AI generated images—of 3.55 on the CIFAR10 image dataset and 6.20 on the ImageNet 64*64 dataset. 

It’s fair to say that OpenAI isn’t the only stakeholder here but is definitely one of the major ones. If they want their AI tools to sell more, the onus falls on them to ensure that they take less time and use less compute. In that sense, the potential impact of consistency models is huge since diffusion models aren’t only popular in image generation but also in video and audio generation models.  

Just last month, Sutskever posted a tweet with a hint, saying, “Many believe that great AI advances must contain a new ‘idea’. But it is not so: many of AI’s greatest advances had the form huh, turns out this familiar unimportant idea, when done right, is downright incredible”. This paper shows exactly that—built on older concepts with a tweak can change everything. 

Download our Mobile App

Poulomi Chatterjee
Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Bangalore

Future Ready | Lead the AI Era Summit

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

20th June | Bangalore

Women in Data Science (WiDS) by Intuit India

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can Apple Save Meta?

The iPhone kicked off the smartphone revolution and saved countless companies. Could the Pro Reality headset do the same for Meta?